Undetectable AI

Sep 14, 2023

Harry Styles and One Direction fandoms are at odds about the recent influx of “leaked snippets” of possible demo tracks by their favorite artists. They can’t tell if they are legitimate or AI-generated (source).

Sites will claim to be able to identify AI-generated writing, images, or music. But none can accurately make this claim. Besides obvious errors in realistic-looking photos (extra limbs, garbled text, etc.), there isn’t a reliable way to detect AI-generated content. At the model layer, you can watermark content in a few different ways: introducing patterns in the token distributions or even the sequences of random numbers used to run the network (see this approach by researchers at Stanford). But that’s completely dependent on the model provider enacting the watermark. With the proliferation of open-source models, bad actors have more than their pick at unwatermarked vanilla generators.

What are the implications?

  • Students will use generative AI to write essays and complete their homework. Teachers will not be able to provably detect it.
  • Sites will have to use other signals to filter AI spam. Some of the spam filtering techniques that might be used.
  • Models will be distilled illegally, and it will be hard to prove that they were. For example, take an open-source but commercially restricted model that prevents using the outputs to train another model. If a user generates data on their own infrastructure without a watermark, it will be hard to prove that the data was distilled.
  • Verifying the authenticity of photos, video, and audio will be complicated. Will they have to be signed by the creator via some sort of PKI?