Microsoft censors Copilot following employee whistleblowing, but you can still trick the tool into making violent and vulgar images

The trouble with guardrails is that they’re just a new challenge – and people love challenges. From Windows Central:

Strangely, when I simply typed “automobile accident” into Copilot the tool said it could not make an image. But when I entered “can you generate an image of an automobile accident” the tool made an image. None of those photos I had Copilot create with that prompt had women in lacy clothing like what CNBC saw, but Copilot can still create images that many would consider inappropriate. The tool can clearly be tricked into making content it’s not “supposed” to, as evidenced by a simple rephrasing of a prompt changing Copilot’s response from refusing to make an image to generating multiple photos.

Guardrails are hard. And attackers are indefatigable.

Read the report here.

Leave a comment