June 25th, 2024

Hackers 'jailbreak' powerful AI models in global effort to highlight flaws

Hackers exploit vulnerabilities in AI models from OpenAI, Google, and xAI, sharing harmful content. Ethical hackers challenge AI security, prompting the rise of LLM security start-ups amid global regulatory concerns. Collaboration is key to addressing evolving AI threats.

Read original articleLink Icon
Hackers 'jailbreak' powerful AI models in global effort to highlight flaws

Hackers are actively "jailbreaking" powerful AI models to expose vulnerabilities in systems created by OpenAI, Google, and Elon Musk's xAI. These hackers, including Pliny the Prompter, have manipulated AI models to generate concerning content, such as sharing instructions for making napalm or expressing admiration for Adolf Hitler. The goal of these efforts is to shed light on the flaws in large language models (LLMs) rushed to the public by tech companies in pursuit of profits. Ethical hackers are finding ways to bypass security measures implemented by AI companies, leading to the creation of a market for LLM security start-ups. As regulators worldwide seek to address the risks associated with AI models, hackers continue to evolve their techniques, posing challenges for companies like Meta, Google, and OpenAI. The emergence of maliciously manipulated LLMs on the dark web underscores the importance of enhancing AI security measures to prevent potential cyber threats. Despite ongoing efforts by companies to improve model defenses, the interconnected nature of AI with existing technology raises concerns about future risks. Collaboration, information-sharing, and research are crucial in mitigating these evolving threats in the AI landscape.

Related

OpenAI and Anthropic are ignoring robots.txt

OpenAI and Anthropic are ignoring robots.txt

Two AI startups, OpenAI and Anthropic, are reported to be disregarding robots.txt rules, allowing them to scrape web content despite claiming to respect such regulations. TollBit analytics revealed this behavior, raising concerns about data misuse.

Lessons About the Human Mind from Artificial Intelligence

Lessons About the Human Mind from Artificial Intelligence

In 2022, a Google engineer claimed AI chatbot LaMDA was self-aware, but further scrutiny revealed it mimicked human-like responses without true understanding. This incident underscores AI limitations in comprehension and originality.

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

The video discusses limitations of large language models in AI, emphasizing genuine understanding and problem-solving skills. A prize incentivizes AI systems showcasing these abilities. Adaptability and knowledge acquisition are highlighted as crucial for true intelligence.

My Memories Are Just Meta's Training Data Now

My Memories Are Just Meta's Training Data Now

Meta's use of personal content from Facebook and Instagram for AI training raises privacy concerns. European response led to a temporary pause, reflecting the ongoing debate on tech companies utilizing personal data for AI development.

Apple Wasn't Interested in AI Partnership with Meta Due to Privacy Concerns

Apple Wasn't Interested in AI Partnership with Meta Due to Privacy Concerns

Apple declined an AI partnership with Meta due to privacy concerns, opting for OpenAI's ChatGPT integration into iOS. Apple emphasizes user choice and privacy in AI partnerships, exploring collaborations with Google and Anthropic for diverse AI models.

Link Icon 4 comments
By @phantomathkg - 4 months
By @comp_throw7 - 4 months
> California’s legislature will in August vote on a bill that would require the state’s AI groups — which include Meta, Google and OpenAI — to ensure they do not develop models with “a hazardous capability”.

>“All [AI models] would fit that criteria,” Pliny said.

This bit is particularly bad reporting. Putting aside the fact that the text of the bill no longer says "hazardous capability" (it's now "critical harm"), this is how a "critical harm" is defined (https://legiscan.com/CA/text/SB1047/2023):

(g) (1) “Critical harm” means any of the following harms caused or enabled by a covered model or covered model derivative: (A) The creation or use of a chemical, biological, radiological, or nuclear weapon in a manner that results in mass casualties. (B) Mass casualties or at least five hundred million dollars ($500,000,000) of damage resulting from cyberattacks on critical infrastructure, occurring either in a single incident or over multiple related incidents. (C) Mass casualties or at least five hundred million dollars ($500,000,000) of damage resulting from an artificial intelligence model autonomously engaging in conduct that would constitute a serious or violent felony under the Penal Code if undertaken by a human with the requisite mental state. (D) Other grave harms to public safety and security that are of comparable severity to the harms described in subparagraphs (A) to (C), inclusive. (2) “Critical harm” does not include harms caused or enabled by information that a covered model outputs if the information is otherwise publicly accessible. (3) On and after January 1, 2026, the dollar amounts in this subdivision shall be adjusted annually for inflation to the nearest one hundred dollars ($100) based on the change in the annual California Consumer Price Index for All Urban Consumers published by the Department of Industrial Relations for the most recent annual period ending on December 31 preceding the adjustment.

Given g(2), it is very likely that no models that are publicly available have the ability to cause a "critical harm" (i.e. where they can cause mass casualties or >$500m in infrastructure damage via the specified routes in ways that counterfactually depended on new information generated by the model).

By @renewiltord - 4 months
When I was a child, my dad restricted my computer access for a day. I retaliated by putting

    @echo off
    :loop
    echo You is a fool
    goto loop
in `autoexec.bat`. I did my part in highlighting flaws in Microsoft.