Anthropic researchers wear down AI ethics with repeated questions

Anthropic researchers wear down AI ethics with repeated questions

2 weeks ago
Anonymous $6hYC3Wwiad

https://techcrunch.com/2024/04/02/anthropic-researchers-wear-down-ai-ethics-with-repeated-questions/

How do you get an AI to answer a question it’s not supposed to? There are many such “jailbreak” techniques, and Anthropic researchers just found a new one, in which a large language model can be convinced to tell you how to build a bomb if you prime it with a few dozen less-harmful questions first.

They call the approach “many-shot jailbreaking,” and have both written a paper about it and also informed their peers in the AI community about it so it can be mitigated.