AI Agents That Matter
The article addresses challenges in evaluating AI agents and proposes solutions for their development. It emphasizes the importance of rigorous evaluation practices to advance AI agent research and highlights the need for reliability and improved benchmarking practices.
Read original articleThe article discusses the challenges in evaluating AI agents and proposes solutions to improve their development and effectiveness. AI agents are systems that use large language models (LLMs) to perform real-world tasks like booking flights or fixing software bugs. The goal is to create assistants like Siri or Alexa that can handle complex tasks accurately and reliably. However, current evaluation practices have pitfalls that lead to agents performing well on benchmarks but not being useful in practice. The paper suggests implementing cost-controlled evaluations, jointly optimizing accuracy and cost, distinguishing model and downstream benchmarking, preventing shortcuts in agent benchmarks, and improving standardization and reproducibility. The authors emphasize the need for rigorous evaluation practices to advance AI agent research and development. Despite challenges, they are cautiously optimistic about the future of AI agents, highlighting the importance of addressing reliability issues and rethinking benchmarking practices to drive progress in the field.
Related
Some Thoughts on AI Alignment: Using AI to Control AI
The GitHub content discusses AI alignment and control, proposing Helper models to regulate AI behavior. These models monitor and manage the primary AI to prevent harmful actions, emphasizing external oversight and addressing implementation challenges.
Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]
The video discusses limitations of large language models in AI, emphasizing genuine understanding and problem-solving skills. A prize incentivizes AI systems showcasing these abilities. Adaptability and knowledge acquisition are highlighted as crucial for true intelligence.
AI Scaling Myths
The article challenges myths about scaling AI models, emphasizing limitations in data availability and cost. It discusses shifts towards smaller, efficient models and warns against overestimating scaling's role in advancing AGI.
Gen AI is passé. Enter the age of agentic AI
The article explores the shift from generative AI to agentic AI in enterprises, focusing on task-specific digital assistants. It discusses structured routes for enterprise agents, agentic AI in supply chain management, RPA's role, and customized systems for businesses, envisioning a goal-oriented AI future.
> [1] In traditional AI, agents are defined entities that perceive and act upon their environment, but that definition is less useful in the LLM era — even a thermostat would qualify as an agent under that definition.
I'm a huge believer in the power of agents, but this kind of complete ignorance of the history of AI gets frustrating. This statement belies a gross misunderstanding of how simple agents have been viewed.
If you're serious about agents then Minsky's The Society of the Mind should be on your desk. From the opening chapter:
> We want to explain intelligence as a combination of simpler things. This means that we must be sure to check, at every step, that none of our agents is, itself, intelligent... Accordingly, whenever we find that an agent has to do anything complicated, we'll replace it with a subsociety of agents that do simpler things.
Instead this write up completely ignores the logic of one of the seminal writings on this topic (and it's okay to disagree with Minsky, I sure do, but you need to at least acknowledge this) and immediately thinks the future of agents must be immensely complex.
Automatic thermostats existed in the early days of research on agents, and the key to a thermostat being an agent is it's ability to communicate with other agents automatically, and collectively perform complex actions.
Most (if not all) agent frameworks use GPT-4, Claude Opus, etc. models which are heavily RLHF'd.
[0]: https://arxiv.org/abs/2406.05587 [1]: https://news.ycombinator.com/item?id=40702617
Non-technical stakeholders also get fixated on this idea of AI agents autonomously working together. Can we save money? Perhaps even replace some people? Without a solid base of reality and a wide imagination, we can see how that conclusion can be drawn.
While agents may have a place, we in the AI space will fall into a credibility loop if this is pushed as the answer. There are plenty of wins for an organization with no "AI" in place. Retrieval Augmented Generation (RAG) is hard in it's own right but there is a reasonable path to success now.
Otherwise, expect disappointment. Then the whole space will be lumped together as a failure.
Controversial opinion there, especially given the hand tuning that those two go through
Related
Some Thoughts on AI Alignment: Using AI to Control AI
The GitHub content discusses AI alignment and control, proposing Helper models to regulate AI behavior. These models monitor and manage the primary AI to prevent harmful actions, emphasizing external oversight and addressing implementation challenges.
Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]
The video discusses limitations of large language models in AI, emphasizing genuine understanding and problem-solving skills. A prize incentivizes AI systems showcasing these abilities. Adaptability and knowledge acquisition are highlighted as crucial for true intelligence.
AI Scaling Myths
The article challenges myths about scaling AI models, emphasizing limitations in data availability and cost. It discusses shifts towards smaller, efficient models and warns against overestimating scaling's role in advancing AGI.
Gen AI is passé. Enter the age of agentic AI
The article explores the shift from generative AI to agentic AI in enterprises, focusing on task-specific digital assistants. It discusses structured routes for enterprise agents, agentic AI in supply chain management, RPA's role, and customized systems for businesses, envisioning a goal-oriented AI future.