July 10th, 2024

RouteLLM: A framework for serving and evaluating LLM routers

RouteLLM is a cost-effective framework for LLM routers, reducing costs by 85% while preserving 95% of GPT-4 performance. It optimizes routing queries between models without sacrificing response quality. Various resources are available on the GitHub URL provided.

Read original articleLink Icon
RouteLLM: A framework for serving and evaluating LLM routers

The GitHub URL provided contains information about RouteLLM, a framework designed for serving and evaluating LLM routers. RouteLLM aims to reduce costs significantly, up to 85% on benchmarks, while still maintaining 95% of GPT-4 performance. This framework facilitates routing queries between different models to optimize costs without compromising response quality. It offers various resources such as installation instructions, a quick start guide, server setup details, model support information, the rationale behind LLM routing, evaluation framework, available routers, configuration specifics, contribution guidelines, and a citation for the associated research paper. For further details or assistance regarding RouteLLM, individuals are encouraged to seek additional information or support as needed.

Related

How to run an LLM on your PC, not in the cloud, in less than 10 minutes

How to run an LLM on your PC, not in the cloud, in less than 10 minutes

You can easily set up and run large language models (LLMs) on your PC using tools like Ollama, LM Suite, and Llama.cpp. Ollama supports AMD GPUs and AVX2-compatible CPUs, with straightforward installation across different systems. It offers commands for managing models and now supports select AMD Radeon cards.

Show HN: Python lib to run evals across providers: OpenAI, Anthropic, etc.

Show HN: Python lib to run evals across providers: OpenAI, Anthropic, etc.

The GitHub repository provides details on LLM Safety Evals, accessible on evals.gg. It features a bar chart, a Twitter post, setup guidelines, and code execution commands. Contact for further support.

LLMs on the Command Line

LLMs on the Command Line

Simon Willison presented a Python command-line utility for accessing Large Language Models (LLMs) efficiently, supporting OpenAI models and plugins for various providers. The tool enables running prompts, managing conversations, accessing specific models like Claude 3, and logging interactions to a SQLite database. Willison highlighted using LLM for tasks like summarizing discussions and emphasized the importance of embeddings for semantic search, showcasing LLM's support for content similarity queries and extensibility through plugins and OpenAI API compatibility.

Meta Large Language Model Compiler

Meta Large Language Model Compiler

Large Language Models (LLMs) are utilized in software engineering but underused in code optimization. Meta introduces the Meta Large Language Model Compiler (LLM Compiler) for code optimization tasks. Trained on LLVM-IR and assembly code tokens, it aims to enhance compiler understanding and optimize code effectively.

MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use

MobileLLM: Optimizing Sub-Billion Parameter Language Models for On-Device Use

The GitHub repository contains MobileLLM code optimized for sub-billion parameter language models for on-device applications. It includes design considerations, code guidelines, outcomes on common sense reasoning tasks, acknowledgements, and licensing details. Contact repository individuals for support.

Link Icon 18 comments
By @fbnbr - 6 months
This RouteLLM framework sounds really promising, especially for cost optimization. It reminds me of the KNN-router project ([https://github.com/pulzeai-oss/knn-router](https://github.co...), which uses a k-nearest neighbors approach to route queries to the most appropriate models.

What I like about these kinds of solutions is that they address the practical challenges of using multiple LLMs. Rate limits, cost per token, and even just choosing the right model for the job can be a real headache.

KNN-router, for example, lets you define your own logic for routing queries, so you can factor in things like model accuracy, response time, and cost. You can even set up fallback models for when your primary model is unavailable.

It's cool to see these kinds of tools emerging because it shows that people are starting to think seriously about how to build robust, cost-effective LLM pipelines. This is going to be crucial as more and more companies start incorporating LLMs into their products and services.

By @furyofantares - 6 months
I don't really get who these are for - do people use them in their projects?

I don't find success just using a prompt against some other model without having some way to evaluate it and usually updating it for that model.

By @Havoc - 6 months
Interesting that it is generalizable to other pairs. That implies some sort of prompt property or characteristic that could be widely used.

I don’t think using different models is the right approach though. They behave differently. Better to use a big and small one from same family. Or alternatively using this to drive whether to give the ai more “thinking time” via chain of thought or agents.

By @worstspotgain - 6 months
I like their "LLM isovalue" graph, and the idea that different vendors can be forced to partake in the same synergy/distillation scheme. Vendors dislike these schemes, but they're probably OK with them as long as they're niche.
By @tananaev - 6 months
The problem is to understand how complex the request is, you have to use a smart enough model.
By @bangaladore - 6 months
I've been using OpenRouter only for personal use, not for its router functionality, so I can use the API of various models (or open-source models) without signing up and prepaying/paying a subscription on all their websites.

I believe OpenRouter also provides an API that does the same thing as RouteLLM. Again, you only have to pay OpenRouter, not every model's service you use.

By @vatican_banker - 6 months
The tool currently allows only one set of strong and weak models.

I’d be really good to allow more than two models and change dynamically based on multiple constraints like latency, reasoning complexity, costs, etc.

By @PetrBrzyBrzek - 6 months
There is a similar project called NotDiamond, which is available on Hugging Face: https://huggingface.co/notdiamond/notdiamond-0001.
By @daghamm - 6 months
My take from this is that 85% of times we don't need a powerfull LLM like 4o.

Or am I reading this wrong? :)

By @TZubiri - 6 months
Or just use a single LLM provider.

Problem solved, next.

By @localfirst - 6 months
solution for a non-critical problem imho

im open to differing opinions but after dealing with langchain, premature optimization for non-critical problems is rampant in this space rn