October 24th, 2024

Throw more AI at your problems

The article advocates for using multiple LLM calls in AI development, emphasizing task breakdown, cost management, and improved performance through techniques like RAG, fine-tuning, and asynchronous workflows.

Read original articleLink Icon
Throw more AI at your problems

The article discusses the evolving landscape of AI application development, emphasizing a strategy of utilizing multiple LLM (Large Language Model) calls to address problems effectively. The authors, Vikram Sreekanti and Joseph E. Gonzalez, argue that rather than relying on a single powerful model, breaking down tasks into smaller components and employing a combination of techniques—such as retrieval-augmented generation (RAG) and fine-tuning—can lead to better performance, lower costs, and improved reliability. They highlight the importance of managing costs and latency by using smaller models for simpler tasks and suggest that parallelization and asynchronous workflows can enhance user experience. The authors also note that this approach can increase resilience against prompt hacking, as stricter output limits can be enforced at each stage of the pipeline. They advocate for a gradual improvement of AI components over time, allowing for the replacement of larger models with more efficient, task-specific ones. Ultimately, they encourage developers to embrace the use of multiple LLM calls as a means to create smarter, more effective AI applications.

- Utilizing multiple LLM calls can enhance AI application performance.

- Combining techniques like RAG and fine-tuning is more effective than relying on a single approach.

- Smaller models can be used for simpler tasks to manage costs and latency.

- Parallelization and asynchronous workflows improve user experience.

- This approach increases resilience against prompt hacking and allows for incremental improvements.

Related

Link Icon 14 comments
By @crooked-v - 4 months
With the current state of "AI", this strikes me as a "I had a problem and used AI, now I have two problems" kind of situation in most cases.
By @Stoids - 4 months
We aren’t good at creating software systems from reliable and knowable components. A bit skeptical that the future of software is making a Rube Goldberg machine of black box inter-LLM communication.
By @kylehotchkiss - 4 months
Use Moore's law to achieve unreal battery life and better experiences for users... or use Moore's law to throw more piles of abstractions on abstractions where we end up with solutions like Electron or I Duck Taped An AI on it.

Reading through this, I could not tell if this was a parody or real. That robot image slopped in the middle certainly didn't help.

By @headcanon - 4 months
I'll stay out of the inevitable "You're just adding a band aid! What are you really trying to do?" discussion since I kind of see the author's point and I'm generally excited about applying LLMs and ML at more tasks. One thing I've been thinking about is if an agent (or collection of agents) can solve a problem initially in a non-scalable way through raw inference, but then develop code to make parts of the solution cheaper to run.

For example, I want to scrape a collection of sites. The agent would at first apply the whole HTML to the context to extract the data (expensive but it works), but then there is another agent that sees this pipeline and says "hey we can write a parser for this site so each scrape is cheaper", and iteratively replaces that segment in a way that does not disrupt the overall task.

By @l5870uoo9y - 4 months
RAG doesn’t necessarily give the best results. Essentially it is a technically elegant way to semantic context to the prompt (for many use cases it is over-engineered). I used to offer RAG SQL query generations on SQLAI.ai and while I might introduce it again, for most use cases it was overkill and even made working with the SQL generator unpredictable.

Instead I implemented low tech “RAG” or “data source rules”. It’s a list of general rules you can attach to a particular data source (ie database). Rules are included in the generations and work great. Examples are “Wrap tables and columns in quotes” or “Limit results to 100”. It’s simple and effective - I can execute the generate SQL again my DB for insights.

By @hggigg - 4 months
I wish this was funny but it’s not. We are doing this now. It has become like “because it’s got electrolytes” in our org.
By @com2kid - 4 months
> This is where compound systems are a valuable framework because you can break down the problem into bite-sized chunks that smaller LLMs can solve.

Just a reminder that smaller fine tuned models are just as good at solving the problems they are trained to solve, as large models are.

> Oftentimes, a call to Llama-3 8B might be enough if you need to a simple classification step or to analyze a small piece of text.

Even 3B param models are powerful now days, especially if you are willing to put the time into prompt engineering. My current side project is working on simulating a small fantasy town using a tiny locally hosted model.

> When you have a pipeline of LLM calls, you can enforce much stricter limits on the outputs of each stage

Having an LLM output a number from 1 to 10, or "error" makes your schema really hard to break.

All you need to do is parse the output and it if isn't a number from 1 to 10... just assume it is garbage.

A system built up like this is much more resilient, and also honestly more pleasant to deal with.

By @swores - 4 months
> "The most common debate was whether RAG or fine-tuning was a better approach, and long-context worked its way into the conversation when models like Gemini were released. That whole conversation has more or less evaporated in the last year, and we think that’s a good thing. (Interestingly, long context windows specifically have almost completely disappeared from the conversation though we’re not completely sure why.)"

I'm a bit confused by their thinking it's a good thing while being confused about why the subject has "disappeared from the conversation".

Could anyone here shed some light / share an opinion on it/why "long context windows" aren't discussed any more? Did everyone decide they're not useful? Or they're so obviously useful that nobody wastes time discussing them? Or...

By @fsndz - 4 months
More AI sauce won't hurt right ? Meanwhile, we still have to solve the ragallucination problem. https://www.lycee.ai/blog/rag-ragallucinations-and-how-to-fi...
By @keeganpoppen - 4 months
YES (although i'm hesitant to even say anything because on some level this is tightly-guarded personal proprietary knowledge from the trenches that i hold quite dear). why aren't you spinning off like 100 prompts from one input? it works great in a LOT of situations. better than you think it does/would, no matter your estimation of its efficacy.
By @alex-moon - 4 months
The title is deliberately provocative but if I'm reading this article right it seems to push an argument that I've made for ages in the context of my own business, and which I think - as the article itself suggests - actually represents the best practice from before the age of ChatGPT, to wit: have lots of ML models, each of which does some very specific thing well, and wire them up, wherever it makes sense to do so, piping the results from one into the input of another. The article is a fan of doing this with language models specifically - and, of course, in natural language-heavy contexts these ML models will most or all be language models - but the same basic premise applies to ML more generally and, as far as I am aware, this is how it used to be done in commercial applications before everyone started blindly trying to make ChatGPT do everything.

I recently discovered BERTopic, a Python library that bundles a five-step pipeline of now pretty old (relatively) NLP approaches in a way that is very similar to how we were already doing it, now wrapped in a nice handy one-liner. I think it's a great exemplar of the approach that will probably emerge from the hype storm on top.

(Disclaimer: I am not an AI expert and will defer to real data/stats nerds on this.)

By @mvdtnz - 4 months
We are truly in the stupidest phase of software engineering yet.