August 11th, 2024

OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

OpenDevin is a platform for AI developers to create agents mimicking human tasks, supporting safe code execution, agent coordination, and performance evaluation, with significant community contributions and an MIT license.

Read original articleLink Icon
FrustrationConcernCuriosity
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

OpenDevin is a newly introduced platform designed for AI software developers, enabling the creation of versatile AI agents that can perform tasks similar to human developers. The platform leverages advancements in large language models (LLMs) to facilitate interactions with the environment through coding, command line operations, and web browsing. OpenDevin supports the development of new agents, safe code execution in sandboxed environments, coordination among multiple agents, and the integration of evaluation benchmarks. The paper evaluates agents using 15 challenging tasks, including software engineering and web browsing benchmarks. OpenDevin is a collaborative project involving over 160 contributors and has received more than 1,300 contributions. It is released under the MIT license, promoting open science and community engagement in AI development.

- OpenDevin is a platform for developing AI agents that mimic human software developers.

- The platform allows for safe code execution and coordination between multiple agents.

- It includes evaluation benchmarks for assessing agent performance on various tasks.

- The project has significant community involvement, with over 1,300 contributions from more than 160 contributors.

- OpenDevin is released under the MIT license, supporting open science initiatives.

AI: What people are saying
The comments on OpenDevin reflect a mix of experiences and concerns regarding the platform's capabilities and implications.
  • Users report varied success with OpenDevin, with some finding it useful for specific tasks while others experienced inefficiencies and high costs.
  • Concerns are raised about the potential dehumanization of developers and the ethical implications of AI agents taking on human-like roles.
  • There are worries about the safety and reliability of AI systems, particularly regarding their autonomy and potential for misuse.
  • Some users suggest integrating OpenDevin with existing development tools to enhance usability.
  • Overall, the community expresses a mix of curiosity and caution about the future of AI in software development.
Link Icon 14 comments
By @yeldarb - 7 months
Tried it a few weeks ago for a task (had a few dozen files in an open source repo I wanted to write tests for in a similar way to each other).

I gave it one example and then asked it to do the work for the other files.

It was able to do about half the files correctly. But it ended up taking an hour, costing >$50 in OpenAI credits, and took me longer to debug, fix, and verify the work than it would have to do the work manually.

My take: good glimpse of the future after a few more Moore’s Law doublings and model improvement cycles make it 10x better, 10x faster, and 10x cheaper. But probably not yet worth trying to use for real work vs playing with it for curiosity, learning, and understanding.

Edit: writing the tests in this PR given the code + one test as an example was the task: https://github.com/roboflow/inference/pull/533

This commit was the manual example: https://github.com/roboflow/inference/pull/533/commits/93165...

This commit adds the partially OpenDevin written ones: https://github.com/roboflow/inference/pull/533/commits/65f51...

By @Animats - 7 months
Nice.

The "Browsing agent" is a bit worrisome. That can reach outside the sandboxed environment. "At each step, the agent prompts the LLM with the task description, browsing action space description, current observation of the browser using accessibility tree, previous actions, and an action prediction example with chain-of-thought reasoning. The expected response from the LLM will contain chain-of-thought reasoning plus the predicted next actions, including the option to finish the task and convey the result to the user."

How much can that do? Is it smart enough to navigate login and signup pages? Can it sign up for a social media account? Buy things on Amazon?

By @czk - 7 months
I used this to scaffold out 5 HTML pages for a web app, having it iterate on building the UX. Did a pretty good job and took about 10 minutes of iterating with it, but cost me about $10 in API credits which was more than I expected.
By @easeout - 7 months
It's gross that this has a person's first name. How dehumanizing that will be for real Devins as this kind of thing becomes productized. How tempting to compare yourself to a "teammate" your employer pays a cloud tenant subscription for.
By @ai4ever - 7 months
i dont like to discourage or be a naysayer. but,

dont build a platform for software on something inherently unreliable. if there is one lesson i have learnt, it is that, systems and abstractions are built on interfaces which are reliable and deterministic.

focus on llm usecases where accuracy is not paramount - there are tons of them. ocr, summarization, reporting, recommendations.

By @causal - 7 months
I suspect that the pursuit of LLM agents is rooted in falling for the illusion of a mind which LLMs so easily weave.

So much of the stuff being built on LLMs in general seems fixated on making that illusion more believable.

By @adamgordonbell - 7 months
I tried opendevin for a sort of one off script that did some file processing.

It was a bit inscrutable what it did, but worked no problem. Much like chat gpt interpreter looping on python errors until it has a working solution, including pip installing the right libs, and reading the docs of the lib for usage errors.

N of 1 and a small freestanding task I had done myself already but I was impressed.

By @wongarsu - 7 months
By @bearjaws - 7 months
So does arxiv.org just let anyone publish a paper now? It seems to be used by AI research a lot more now instead of just a blog post.
By @eterps - 7 months
Does it have different goals than: https://aider.chat ?
By @android521 - 7 months
I don't need OpenDevin. I just need AI to reliably write a function or unit test or create a small UI component. It needs to check latest documentation as its answer is often outdate. It needs to be able to pass test and debug itself without getting into a loop of repetitive error and can't get out of that hole. If LLM can do that , it would be saving me so much time. But latest models are all bad currently .
By @skywhopper - 7 months
Please don’t give any tools, AI or not, the freedom to run away like this. You’re inviting a new era of runaway worm-style viruses by giving such autonomy to easily manipulated programs.

To what end anyway? This is massively resource heavy, and the end goal seems to be to build a program that would end your career. Please work on something that will actually make coding easier and safer rather than building tools to run roughshod over civilization.

By @candiddevmike - 7 months
Why isn't this integrated with an IDE? Or am I missing that