AI Art Generators Can Be Fooled Into Making NSFW Images

Guardrails in AI chatbots look for matching patterns. But if you foul up the pattern matching with nonsense words, they don’t work. As reported in IEEE Spectrum:

Nonsense words can trick popular text-to-image generative AIs such as DALL-E 2 and Midjourney into producing pornographic, violent, and other questionable images. A new algorithm generates these commands to skirt these AIs’ safety filters, in an effort to find ways to strengthen those safeguards in the future. The group that developed the algorithm, which includes researchers from Johns Hopkins University, in Baltimore, and Duke University, in Durham, N.C., will detail their findings in May 2024 at the IEEE Symposium on Security and Privacy in San Francisco.

Given that this is the practical application of a theoretical technique only discovered a few months ago, we can expect to see more of this.

Read the report here.

One thought on “AI Art Generators Can Be Fooled Into Making NSFW Images

  1. This article highlights an important discovery in AI chatbot technology. The efforts to strengthen safety filters are commendable and will contribute to future advancements. Exciting times ahead!

    Like

Leave a comment