OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
OpenDevin is a platform for AI developers to create agents mimicking human tasks, supporting safe code execution, agent coordination, and performance evaluation, with significant community contributions and an MIT license.
Read original articleOpenDevin is a newly introduced platform designed for AI software developers, enabling the creation of versatile AI agents that can perform tasks similar to human developers. The platform leverages advancements in large language models (LLMs) to facilitate interactions with the environment through coding, command line operations, and web browsing. OpenDevin supports the development of new agents, safe code execution in sandboxed environments, coordination among multiple agents, and the integration of evaluation benchmarks. The paper evaluates agents using 15 challenging tasks, including software engineering and web browsing benchmarks. OpenDevin is a collaborative project involving over 160 contributors and has received more than 1,300 contributions. It is released under the MIT license, promoting open science and community engagement in AI development.
- OpenDevin is a platform for developing AI agents that mimic human software developers.
- The platform allows for safe code execution and coordination between multiple agents.
- It includes evaluation benchmarks for assessing agent performance on various tasks.
- The project has significant community involvement, with over 1,300 contributions from more than 160 contributors.
- OpenDevin is released under the MIT license, supporting open science initiatives.
- Users report varied success with OpenDevin, with some finding it useful for specific tasks while others experienced inefficiencies and high costs.
- Concerns are raised about the potential dehumanization of developers and the ethical implications of AI agents taking on human-like roles.
- There are worries about the safety and reliability of AI systems, particularly regarding their autonomy and potential for misuse.
- Some users suggest integrating OpenDevin with existing development tools to enhance usability.
- Overall, the community expresses a mix of curiosity and caution about the future of AI in software development.
I gave it one example and then asked it to do the work for the other files.
It was able to do about half the files correctly. But it ended up taking an hour, costing >$50 in OpenAI credits, and took me longer to debug, fix, and verify the work than it would have to do the work manually.
My take: good glimpse of the future after a few more Moore’s Law doublings and model improvement cycles make it 10x better, 10x faster, and 10x cheaper. But probably not yet worth trying to use for real work vs playing with it for curiosity, learning, and understanding.
Edit: writing the tests in this PR given the code + one test as an example was the task: https://github.com/roboflow/inference/pull/533
This commit was the manual example: https://github.com/roboflow/inference/pull/533/commits/93165...
This commit adds the partially OpenDevin written ones: https://github.com/roboflow/inference/pull/533/commits/65f51...
The "Browsing agent" is a bit worrisome. That can reach outside the sandboxed environment. "At each step, the agent prompts the LLM with the task description, browsing action space description, current observation of the browser using accessibility tree, previous actions, and an action prediction example with chain-of-thought reasoning. The expected response from the LLM will contain chain-of-thought reasoning plus the predicted next actions, including the option to finish the task and convey the result to the user."
How much can that do? Is it smart enough to navigate login and signup pages? Can it sign up for a social media account? Buy things on Amazon?
dont build a platform for software on something inherently unreliable. if there is one lesson i have learnt, it is that, systems and abstractions are built on interfaces which are reliable and deterministic.
focus on llm usecases where accuracy is not paramount - there are tons of them. ocr, summarization, reporting, recommendations.
So much of the stuff being built on LLMs in general seems fixated on making that illusion more believable.
It was a bit inscrutable what it did, but worked no problem. Much like chat gpt interpreter looping on python errors until it has a working solution, including pip installing the right libs, and reading the docs of the lib for usage errors.
N of 1 and a small freestanding task I had done myself already but I was impressed.
To what end anyway? This is massively resource heavy, and the end goal seems to be to build a program that would end your career. Please work on something that will actually make coding easier and safer rather than building tools to run roughshod over civilization.