Guardrails keep AI chatbots behaving. But there’s a bit of an arms race going on between defenders of those guardrails and attackers working to breach them. A bit of research reported in The Register reports how that’s going:
“We get a 65x speedup with our method over existing gradient-based attacks. There are also other methods that require access to more powerful models, such as GPT-4, to perform their attacks, which can be monetarily expensive.”
How long will it be before these sorts of attacks are deployed at scale?
Read the report here.