August 16th, 2024

LLM and Bug Finding: Insights from a $2M Winning Team in the White House's AIxCC

Team Atlanta, formed for DARPA's AIxCC, includes six institutions like Georgia Tech and Samsung Research. They focus on AI-driven cybersecurity, adapting strategies to address vulnerabilities and enhance their Cyber Reasoning System.

Read original articleLink Icon
LLM and Bug Finding: Insights from a $2M Winning Team in the White House's AIxCC

Team Atlanta has been formed to compete in the DARPA AIxCC with their AI-driven cybersecurity solution, Atlantis. The team comprises six institutions, including Georgia Tech and Samsung Research, and is led by Georgia Tech alumni with experience in major hacking competitions. They have been preparing for the competition since October 2023, focusing on leveraging AI, particularly large language models (LLMs), to enhance their Cyber Reasoning System (CRS). Their CRS, named Skynet, has shown promising results in static analysis and fine-tuning for source code analysis. The team faced challenges in understanding the competition's requirements, particularly regarding proof-of-vulnerability (PoV) and the need for dynamic approaches like fuzzing to demonstrate bug-triggering inputs. They initially aimed to utilize static analysis but realized the necessity of adapting to the competition's demands. The first challenge involved addressing a known vulnerability in the Linux kernel, which highlighted the complexities of triggering bugs and the importance of developing effective harnesses for testing. The team is committed to advancing cybersecurity through innovative AI applications and is actively refining their strategies to excel in the competition.

- Team Atlanta is competing in DARPA's AIxCC with their AI-driven cybersecurity solution, Atlantis.

- The team consists of six institutions, including Georgia Tech and Samsung Research, with members experienced in hacking competitions.

- They initially focused on static analysis but shifted to dynamic approaches like fuzzing to meet competition requirements.

- The first challenge involved a known Linux kernel vulnerability, emphasizing the complexities of bug triggering.

- The team aims to leverage AI and LLMs to enhance their Cyber Reasoning System for effective cybersecurity solutions.

Related

Hackers 'jailbreak' powerful AI models in global effort to highlight flaws

Hackers 'jailbreak' powerful AI models in global effort to highlight flaws

Hackers exploit vulnerabilities in AI models from OpenAI, Google, and xAI, sharing harmful content. Ethical hackers challenge AI security, prompting the rise of LLM security start-ups amid global regulatory concerns. Collaboration is key to addressing evolving AI threats.

A Hacker Stole OpenAI Secrets, Raising Fears That China Could, Too

A Hacker Stole OpenAI Secrets, Raising Fears That China Could, Too

A hacker breached OpenAI's internal messaging systems, accessing discussions on A.I. tech. No code was compromised. The incident sparked internal debates on security and A.I. risks amid global competition.

Prepare for AI Hackers

Prepare for AI Hackers

DEF CON 2016 hosted the Cyber Grand Challenge where AI systems autonomously hacked programs. Bruce Schneier warns of AI hackers exploiting vulnerabilities rapidly, urging institutions to adapt to AI-devised attacks efficiently.

Continue (YC S23) Is Hiring a Software Engineer in San Francisco

Continue (YC S23) Is Hiring a Software Engineer in San Francisco

A San Francisco startup, Continue, seeks a software engineer for their AI code assistant project. Proficiency in TypeScript, Node.js, and problem-solving skills are key. The company aims to empower developers with innovative solutions.

AI companies promised to self-regulate one year ago. What's changed?

AI companies promised to self-regulate one year ago. What's changed?

AI companies like Amazon, Google, and Microsoft committed to safe AI development with the White House. Progress includes red-teaming practices and watermarks, but lacks transparency and accountability. Efforts like red-teaming exercises, collaboration with experts, and information sharing show improvement. Encryption and bug bounty programs enhance security, but independent verification and more actions are needed for AI safety and trust.

Link Icon 8 comments
By @hqzhao - 2 months
I'm part of the team, and we used LLM agents extensively for smart bug finding and patching. I'm happy to discuss some insights, and share all of the approaches after grand final :)
By @garlic_chives - 2 months
AIxCC is an AI Cyber Challenge launched by DARPA and ARPA-H.

Notably, a zero-day vulnerability in SQLite3 was discovered and patched during the AIxCC semifinals, demonstrating the potential of LLM-based approaches in bug finding.

By @sim7c00 - 2 months
this is really impressive work. coverage guided and especially directed fuzing can be extremely difficult. its mentioned fuzzing is not a dumb technique. I think the classical idea is kind of dumb, in the sense of 'dumb fuzzers' but these days there is tons of intelligence built around it now aand poured into it, but i've always thought its now beyond the classic idea of fuzz testing. i had colleagues who poured their soul into trying to use git commit info etc. to try and help find potentially bad code paths and then coverage guided fuzzing trying to get in there. I really like the little note at the bottom about this. adding such layers kind of does make it lean towards machine learning nowadays, and id think perhaps fuzzing is not the right term anymore. i dont think many people are actually still simply generating random inputs and trying to crash programs like that.

this is really exciting new progress around this type of field guys. well done! cant wait to see what new tools and techniques will be yielded from all of this research.

Will you guys be open to implementing something around libafl++ perhaps? i remember we worked with that extensively. As a lot of shops use that already it might be cool to look at integration into such tools or would you think this deviates so far it'll amount to a new kind of tool entirely? Also, the work on datasets might be really valuable to other researchers. there was a mention of wasted work but labeled sets of data around cve, bug and patch commits can help a lot of folks if theres new data in there.

this kind of makes me miss having my head in this space :D cool stuff and massive congrats on being finalists. thanks for the extensive writeup!

By @wslh - 2 months
BTW, have you seen the new LLMsic offensive tools such as XBOW [1]? They just received a founding round from Sequoia Capital [2].

[1] https://xbow.com/

[2] https://www.sequoiacap.com/article/partnering-with-xbow-the-...

By @deeznuttynutz - 2 months
What's the good word!!
By @rockskon - 2 months
The AIxcc booth felt like it was meant for a tradeshow as opposed to being a place where someone could learn something.