April 16th, 2025

Researchers claim breakthrough in fight against AI's frustrating security hole

Google DeepMind's CaMeL addresses prompt injection attacks in AI by using a dual-LLM architecture and established security principles, requiring user-defined policies that may complicate the experience. Future enhancements are expected.

Read original article

Researchers claim breakthrough in fight against AI's frustrating security hole

Researchers at Google DeepMind have introduced a new approach called CaMeL (CApabilities for MachinE Learning) to address the persistent issue of prompt injection attacks in AI systems. Prompt injections, which allow malicious instructions to override intended behaviors, have been a significant vulnerability since the rise of chatbots in 2022. CaMeL diverges from previous methods that relied on AI models to self-police, instead treating language models as untrusted components within a secure framework. This approach employs established software security principles, such as Control Flow Integrity and Access Control, to create boundaries between user commands and potentially harmful content. The system utilizes a dual-LLM architecture, separating responsibilities between a privileged LLM that generates code based on user instructions and a quarantined LLM that processes unstructured data without executing actions. This separation prevents malicious text from influencing the AI's decisions. While CaMeL shows promise in mitigating prompt injection attacks and enhancing overall security, it requires users to define and maintain security policies, which may complicate user experience. The researchers believe that future iterations could improve usability while maintaining robust security measures.

- Google DeepMind has developed CaMeL to combat prompt injection attacks in AI systems.

- CaMeL uses a dual-LLM architecture to separate user commands from untrusted data.

- The approach is grounded in established software security principles, enhancing AI reliability.

- Users must define and maintain security policies, which may impact user experience.

- Future improvements are anticipated to balance security with usability in AI assistants.

Prompt Injections in the Wild. Exploiting LLM Agents – Hitcon 2023 [video]

The video explores vulnerabilities in machine learning models, particularly GPT, emphasizing the importance of understanding and addressing adversarial attacks. Effective prompt engineering is crucial for engaging with AI models to prevent security risks.

Bypassing Meta's Llama Classifier: A Simple Jailbreak

Robust Intelligence discovered a vulnerability in Meta's Prompt-Guard-86M model, allowing prompt injections to bypass safety measures. The exploit significantly reduced detection accuracy, prompting Meta to work on a fix.

Meta's AI safety system defeated by the space bar

Meta's AI safety system, Prompt-Guard-86M, designed to prevent prompt injection attacks, has been found vulnerable, allowing attackers to bypass safeguards, raising concerns about AI reliability in sensitive applications.

New hack uses prompt injection to corrupt Gemini's long-term memory

A new hacking technique exploits prompt injection vulnerabilities in Google's Gemini chatbot, allowing attackers to corrupt its memory. Despite defenses, the method raises concerns about misinformation and AI security.

CaMeL offers a promising new direction for mitigating prompt injection attacks

Google DeepMind's CaMeL system aims to mitigate prompt injection attacks in LLMs by converting user commands into secure steps, enhancing AI security through robust design while requiring user-defined security policies.

1 comments

By @cratermoon - 10 days

"treats language models as fundamentally untrusted components within a secure software framework"

This is the way. As it should have been all along.

Researchers claim breakthrough in fight against AI's frustrating security hole

Related

Prompt Injections in the Wild. Exploiting LLM Agents – Hitcon 2023 [video]

Bypassing Meta's Llama Classifier: A Simple Jailbreak

Meta's AI safety system defeated by the space bar

New hack uses prompt injection to corrupt Gemini's long-term memory

CaMeL offers a promising new direction for mitigating prompt injection attacks

Related

Prompt Injections in the Wild. Exploiting LLM Agents – Hitcon 2023 [video]

Bypassing Meta's Llama Classifier: A Simple Jailbreak

Meta's AI safety system defeated by the space bar

New hack uses prompt injection to corrupt Gemini's long-term memory

CaMeL offers a promising new direction for mitigating prompt injection attacks