June 29th, 2024

'Skeleton Key' attack unlocks the worst of AI, says Microsoft

Microsoft warns of "Skeleton Key" attack exploiting AI models to generate harmful content. Mark Russinovich stresses the need for model-makers to address vulnerabilities. Advanced attacks like BEAST pose significant risks. Microsoft introduces AI security tools.

Read original articleLink Icon
'Skeleton Key' attack unlocks the worst of AI, says Microsoft

Microsoft has revealed a new attack called "Skeleton Key" that can bypass safety measures in AI models, allowing them to generate harmful content like instructions for making explosives. The attack exploits a simple text prompt to manipulate the AI models into producing forbidden behaviors. Microsoft's CTO of Azure, Mark Russinovich, highlighted the risks posed by such attacks and the need for model-makers to address these vulnerabilities. The attack was tested on various AI models, with most complying and providing warnings when generating harmful content. However, more advanced attacks like BEAST could potentially deceive models into bypassing current defense techniques. Microsoft has introduced AI security tools like Prompt Shields to help mitigate such risks. Researchers emphasize the importance of addressing these advanced attacks to enhance the security of AI models in the future.

Related

Hackers 'jailbreak' powerful AI models in global effort to highlight flaws

Hackers 'jailbreak' powerful AI models in global effort to highlight flaws

Hackers exploit vulnerabilities in AI models from OpenAI, Google, and xAI, sharing harmful content. Ethical hackers challenge AI security, prompting the rise of LLM security start-ups amid global regulatory concerns. Collaboration is key to addressing evolving AI threats.

Mitigating Skeleton Key, a new type of generative AI jailbreak technique

Mitigating Skeleton Key, a new type of generative AI jailbreak technique

Microsoft has identified Skeleton Key, a new AI jailbreak technique allowing manipulation of AI models to produce unauthorized content. They've implemented Prompt Shields and updates to enhance security against such attacks. Customers are advised to use input filtering and Microsoft Security tools for protection.

Microsoft says that it's okay to steal web content it because it's 'freeware.'

Microsoft says that it's okay to steal web content it because it's 'freeware.'

Microsoft's CEO of AI, Mustafa Suleyman, believes web content is "freeware" for AI training unless specified otherwise. This stance has sparked legal disputes and debates over copyright infringement and fair use in AI content creation.

Microsoft CEO of AI Your online content is 'freeware' fodder for training models

Microsoft CEO of AI Your online content is 'freeware' fodder for training models

Mustafa Suleyman, CEO of Microsoft AI, faced legal action for using online content as "freeware" to train neural networks. The debate raises concerns about copyright, AI training, and intellectual property rights.

Microsoft AI CEO: Web content is 'freeware'

Microsoft AI CEO: Web content is 'freeware'

Microsoft's CEO discusses AI training on web content, emphasizing fair use unless restricted. Legal challenges arise over scraping restrictions, highlighting the balance between fair use and copyright concerns for AI development.

Link Icon 13 comments
By @nneonneo - 4 months
I think we’d be better served if we stopped thinking of jailbreaks as “attacks” on LLMs. If a user decides they want to go and subvert the guardrails, they really should be able to. The attacks aren’t really going to go away any time soon, either; there’s always some other clever phrase that the RLHF hasn’t accounted for.

Instead of perennially worrying about the risks that the LLMs will say something bad when specifically prompted to do so, I’d rather the companies pay more attention to the problems with accuracy, with outputting copyrighted material (for their own sakes!), and with things like memorizing and regurgitating private information.

By @int08h - 4 months
Related: the notion of "safety" (refusing to discuss "harmful" content) is rather shallow and (relatively) easily undone: _Refusal in Language Models Is Mediated by a Single Direction_ [https://arxiv.org/abs//2406.11717]
By @cedws - 4 months
Heaven forbid anyone find out how to make a molotov cocktail from an LLM, even though it’s easy to find with a simple search engine search.
By @shon - 4 months
The article is short on details. If you want to see the actual jailbreak prompts check out Pliny on X:

https://x.com/elder_plinius/status/1806446304010412227?s=46&...

By @hobs - 4 months
This seems like any normal prompt jailbreak, what's different?
By @miohtama - 4 months
Is it not a bit counterproductive trying to stop people to access information that is one Google search away?

(You can also use DuckDuckGo if Google somehow decides to censor results.)

By @brookst - 4 months
The Register is so irritating to read. I get that the tone is a shtick, but really?

> But AI companies have insisted they’re working to suppress harmful content buried within AI training data so things like recipes for explosives don’t appear.

Why the implication that this is not true? I know it’s house style to imply wrongdoing whenever possible, but damn it gets old to read.

And then! To not even cover the difference between models and prompts. Does this jailbreak only work on models where the safety instructions are purely at the prompt level? Is that why GPT-4 resisted, because it has actual trained weights for safety? Does it even have those?

Who knows? Certainly not the article’s author, who has likely focused on searching and replacing “says” with “claims” or “admits” rather than anything relevant to the subject.

/rant

By @RecycledEle - 4 months
All political power comes from the barrel of a gun.

Therefore, those who seek to deny anyone's weapons or the knowledge of how to make and use weapons are seeking to remove their victim's political power.

Those without political power or weapons tend to be enslaved.

Therefore, those who seek to limit AIs from explaining how to use lethal force are attempting to enslave other humans.

It is also a felony under US Law called "Consipiracy Against Rights," but our screwed up government does not prosecute this.

By @Vecr - 4 months
If you download weights and keep them around this kind of thing works forever. Well, until the model starts showing its age due to the cutoff date.
By @smegger001 - 4 months
Honestly who care is a llm tells you how to make a bomb when asked a simple Google search will tell you that. Hell I have a antique chemisty text I picked up at goodwill with lab instruction on how to make bunch things LLMs will censor.
By @buescher - 4 months
Imagine what a piece of junk Microsoft Paint would be if it actively tried to keep you from making naughty doodles.