October 1st, 2024

Should you use Rust in LLM based tools for performance?

The article highlights the advantages of using Rust for text processing software, showing its superior performance over Python's Langchain, particularly in indexing and querying for LLM tools.

Read original article

CuriositySkepticismEnthusiasm

Should you use Rust in LLM based tools for performance?

The article discusses the rationale behind using Rust for building text processing software, specifically in the context of LLM (Large Language Model) tools. The author, Tinco Andringa, explores the performance of their Rust-based indexing and querying pipeline compared to Python's Langchain. Despite the expectation that LLM inference would be the primary bottleneck, the Rust implementation proved to be significantly faster than Langchain. The motivation for choosing Rust includes its performance benefits, reliability, and strong ecosystem. The benchmarks revealed that while both implementations spent a considerable amount of time in the ONNX runtime for generating embeddings, Langchain's performance was hindered by inefficient preprocessing steps. After correcting the benchmark setup, the performance difference narrowed, but Rust still maintained an edge. The article emphasizes that while GPU processing is crucial, other factors in the pipeline can also impact overall performance. Ultimately, Rust's capabilities in parallelism and efficient string processing can enhance performance, depending on specific project needs.

- Rust-based tools can outperform Python implementations in certain scenarios.

- Efficient preprocessing steps are critical for accurate performance benchmarking.

- GPU processing is not the only factor affecting performance; CPU-bound tasks can also be significant.

- Rust offers advantages in parallelism and string processing that can optimize performance.

- The article encourages exploring Rust for projects requiring high efficiency.

Spending too much time optimizing for loops

Researcher Octave Larose shared insights on optimizing Rust interpreters, focusing on improving performance for the SOM language. By enhancing loop handling and addressing challenges, significant speedups were achieved, balancing code elegance with efficiency.

I Hope Rust Does Not Oxidize Everything

The author expresses concerns about Rust's widespread adoption in programming, citing issues with syntax, async features, complexity, and long compile times. They advocate for language diversity to prevent monoculture, contrasting Rust with their language Yao.

Language Compilation Speed (2021)

The article examines Rust's compilation speed compared to C/C++, noting frustrations among developers. It proposes a benchmarking method, revealing GCC compiles at 5,000 lines per second and Clang at 4,600.

Build a quick Local code intelligence using Ollama with Rust

Bosun developed Swiftide, a Rust-based tool for efficient code indexing and querying, utilizing Qdrant and FastEmbed. It enhances performance with OpenTelemetry, integrating various language models for improved response times.

Rust for the small things? but what about Python?

The article compares Rust and Python for data engineering, highlighting Python's integration with LLMs and tools like Polars, while noting Rust's speed and safety but greater complexity.

AI: What people are saying

The comments reflect a range of opinions on the use of Rust for text processing software compared to Python's Langchain.

Some commenters highlight the performance benefits of Rust over Python, emphasizing the need for compiled languages in certain applications.
Concerns are raised about the Global Interpreter Lock (GIL) in Python, which affects performance in multi-user scenarios.
There is a discussion about the trade-offs between performance and development complexity when using Rust.
Several users question the necessity of optimizing performance at the startup stage, suggesting that it may not be the best use of resources.
Some commenters propose alternatives like C++ for similar performance benefits with easier deployment.

14 comments

By @pjmlp - 7 months

Most of the Python libraries, are anyway bindings to native libraries.

Any other ecosystem is able to plug into the same underlying native libraries, or even call them directly in case of being the same language.

In a way it is kind of interesting the performance pressure that is going on Python world, otherwise CPython folks would never reconsider changing their stance on performance.

By @lmeyerov - 7 months

At least for Louie.ai, basically genAI-native computational notebooks, where operational analysts ask for intensive analytics tasks for like pulling Splunk/Databricks/neo4j data, getting it wrangled in some runtime, cluster/graph/etc it, and generate interactive viz, Python has ups and downs:

On the plus side, it means our backend gets to handle small/mid datasets well. Apache Arrow adoption in analytics packages is strong, so zero copy & and columnar flows on many rows is normal. Pushing that to the GPU or another process is also great.

OTOH, one of our greatest issues is the GIL. Yes, it shows up a bit in single user code, and not discussed in the post, especially when doing divide-and-conquer flows for a user. However, the bigger issue is in stuffing many concurrent users into the same box to avoid blowing your budget. We would like the memory sharing benefits of threaded, but because of the GIL, want the isolation benefits of multiprocess. A bit same-but-different, we stream results to the browser as agents progress in your investigation, and that has not been as smooth as we have done with other languages.

And moving to multiprocess is no panacea. Eg, a local embedding engine is expensive to do in-process per worker because modern models have high RAM needs. So that biases to using a local inference server for what is meant to be an otherwise local call, which is doable, but representative of that extra work needed for production-grade software.

Interesting times!

By @tpoacher - 7 months

Even LLM's can see through your silly tricks:

> Q: The question "Should you use Rust in LLM based tools for performance?" contains a fallacy. Could you tell me the name of that fallacy?

> A: A clever question!

> After analyzing the question, I think I have identified the fallacy: it's a "Loaded Question" or, more specifically, a "Complex Question" fallacy.

> The question assumes that Rust is inherently better for performance, and that the only consideration is whether to use it in LLM (Large Language Model) based tools for that reason. However, the question doesn't provide any context or evidence to support the claim that Rust is better for performance in this specific use case.

> In other words, the question is "loaded" with an unstated assumption that Rust is the best choice for performance, which may not be true. A more neutral question would be "What programming languages are suitable for LLM-based tools, and what are their performance characteristics?"

> Am I correct?

Yes you are, Mr Mixtral. Yes you are. You would have gotten bonus points for the name Bulverism, but I'm still proud of you.

By @zozbot234 - 7 months

Am I the only one who thinks a Swift IDE project should be called Taylor?

By @elpalek - 7 months

Langchain and other frameworks are too bloated, it's good for demo, but highly recommend to build your own pipeline in production, it's not really that complicated, and you can have much better control over implementation. Plus you don't need 99% packages that comes with Langchain, reduce security vulnerabilities.

I've written a series of RAG notebooks on how to implement RAG in python directly, with minimal packages. I know it's not in Rust or C++, but it can give you some ideas on how to do things directly.

https://github.com/yudataguy/RawRAG

By @dmezzetti - 7 months

I've covered this before in articles such as this: https://neuml.hashnode.dev/building-an-efficient-sparse-keyw...

You can make anything performant if you know the right buttons to push. While Rust makes it easy in some ways, Rust is also a difficult language to develop with for many developers. There is a tradeoff.

I'd also say LangChain's primary goal isn't performance it's convenience and functionality coverage.

By @bborud - 7 months

It would be helpful to move to a compiled language with a decent toolchain. Rust and Go are good candidates.

By @satvikpendem - 7 months

I was asking the same question, turns out mistral.rs [0] has pretty good abstractions in order to not depend and package llama.cpp for every platform.

[0] https://github.com/EricLBuehler/mistral.rs

By @RcouF1uZ4gsC - 7 months

Why not use C++?

For the most part, these aren't security critical components.

You already have a massive amount of code you can use like say llama.cpp

You get the performance that you do with Rust.

Compared to Python, in addition to performance, you also get a much easier deployment story.

By @sandGorgon - 7 months

this is very cool!

we built something for our internal consumption (and now used in quite a few places in India).

Edgechains is declarative (jsonnet) based. so chains + prompts are declarative. And we built an wasm compiler (in rust based on wasmedge).

https://github.com/arakoodev/EdgeChains/actions/runs/1039197...

By @zitterbewegung - 7 months

This is a comparison of apples to oranges. Langchain has an order of magnitude of examples, of integrations and features and also rewrote its whole architecture to try to make the chaining more understandable. I don't see enough documentation in this pipeline to understand how to migrate my app to this. I also realize it would take me at least a week even migrate my own app to Langchain's rewrite.

Langchain is used because it was a first mover and that's the same reason it's achilles heel and not for speed at all.

By @zie1ony - 7 months

DSPy is in Python, so it must be Python. Sorry bro :P

By @swyx - 7 months

i mean LLM based or not has nothing to do with it, this is a standard optimization, scripting lang vs systems lang story.

By @serjester - 7 months

I'm surprised they don't talk about the business side of this - did they have users complaining about the speed? At the end of day they only increased performance by 50%.

These kind of optimization seem awesome once you have a somewhat mature product but you really have to wonder if this is the best use of a startup's very limited bandwidth.

Spending too much time optimizing for loops

I Hope Rust Does Not Oxidize Everything

Language Compilation Speed (2021)

Build a quick Local code intelligence using Ollama with Rust

Rust for the small things? but what about Python?

The article compares Rust and Python for data engineering, highlighting Python's integration with LLMs and tools like Polars, while noting Rust's speed and safety but greater complexity.

Should you use Rust in LLM based tools for performance?

Related

Spending too much time optimizing for loops

I Hope Rust Does Not Oxidize Everything

Language Compilation Speed (2021)

Build a quick Local code intelligence using Ollama with Rust

Rust for the small things? but what about Python?

Related

Spending too much time optimizing for loops

I Hope Rust Does Not Oxidize Everything

Language Compilation Speed (2021)

Build a quick Local code intelligence using Ollama with Rust

Rust for the small things? but what about Python?