July 26th, 2024

Using Rust to corrode insane Python run-times

Vortexa improved a Python task processing GPS signals from 30 hours to 6 hours by developing a custom Rust library, achieving a 24x speed increase while maintaining existing business logic.

Read original articleLink Icon
ConfusionSkepticismDisappointment
Using Rust to corrode insane Python run-times

Vortexa faced significant performance issues with a Python task that took 30 hours to process GPS signals from ships, impacting their QA feedback cycle. The task involved filtering these signals through multiple polygons, and profiling revealed that a substantial portion of the runtime was spent in matplotlib for point-in-polygon calculations. The initial attempt to optimize using geopandas resulted in a tenfold increase in runtime, prompting a reevaluation of the approach. The team decided to develop a custom library in Rust, leveraging PyO3 for integration with Python. This new solution involved creating a Python class in Rust that processed geojson strings and numpy arrays, significantly reducing the number of library calls and utilizing integer-based math for efficiency. The results were impressive; the Rust implementation reduced processing time from 29.8 seconds to just 2.9 seconds, achieving a 24x improvement over the original matplotlib approach. The overall task duration decreased from 30 hours to 6 hours, enhancing the speed of model updates and QA processes. While the introduction of Rust added complexity to the codebase, the targeted optimization proved beneficial, maintaining the integrity of the existing business logic. Vortexa continues to seek talent for roles in Python, data science, Java, and Rust, reflecting their commitment to tackling ongoing technical challenges.

Related

Spending too much time optimizing for loops

Spending too much time optimizing for loops

Researcher Octave Larose shared insights on optimizing Rust interpreters, focusing on improving performance for the SOM language. By enhancing loop handling and addressing challenges, significant speedups were achieved, balancing code elegance with efficiency.

Announcing Polars 1.0 (Blog Post)

Announcing Polars 1.0 (Blog Post)

Polars releases Python version 1.0 after 4 years, gaining popularity with 27.5K GitHub stars and 7M monthly downloads. Plans include improving performance, GPU acceleration, Polars Cloud, and new features.

Spending too much time optimizing for loops

Spending too much time optimizing for loops

Researcher Octave Larose discussed optimizing Rust interpreters, focusing on improving performance for the SOM language. They highlighted enhancing loop efficiency through bytecode and primitives, addressing challenges like Rust limitations and complex designs. Despite performance gains, trade-offs between efficiency and code elegance persist.

Photoroom (YC S20) Is Hiring Rust Developers in Paris (X-Platform, Wgpu, WASM)

Photoroom (YC S20) Is Hiring Rust Developers in Paris (X-Platform, Wgpu, WASM)

Photoroom, a Paris-based AI photo editor, seeks a Senior Rust Engineer to enhance cross-platform libraries for Android, iOS, and Web apps. Remote work in Europe with occasional Paris visits. Role involves Rust, WebAssembly, WebGPU development impacting millions. Ideal candidate has 3+ years Rust experience, C/C++ familiarity, and Swift/Kotlin knowledge. Bonus for OpenGL, Metal, WebGPU, WebAssembly, image editing experience. Hiring process includes interviews and home assignment review. Photoroom prioritizes diversity, equity, inclusion, flexible hours, and supportive environment.

Oxidize – Notes on moving Harfbuzz and Freetype tools and libraries to Rust

Oxidize – Notes on moving Harfbuzz and Freetype tools and libraries to Rust

The "oxidize" project on GitHub aims to migrate tasks from Python & C++ to Rust, such as shaping, rasterization, font compilation, and manipulation. It outlines objectives, priorities, and references. For more details, inquire further.

AI: What people are saying
The comments on the article about Vortexa's Rust library for processing GPS signals reveal several key points of discussion.
  • Many commenters suggest that spatial indexing techniques could significantly improve performance beyond the Rust implementation.
  • There is skepticism about the claimed speedup, with some noting that comparisons should include existing C implementations and multi-threading capabilities.
  • Alternative solutions, such as using Numpy, PyPy, or even Julia, are proposed as potentially more efficient options.
  • Several users express a desire for more technical details or code examples to better understand the performance improvements.
  • There is a general consensus that the optimization landscape is broader than just switching to Rust, with various tools and methods available for performance enhancement.
Link Icon 16 comments
By @timhh - 6 months
Difficult to draw conclusions with no code here.

An interesting thing they didn't mention is that Matplotlib's point-in-path code is actually already in C. So this isn't really a case of Rust being X times faster than Python, it's X times faster than some other C algorithm. That's probably why X is only ~4 (they don't actually give a single-thread comparison), instead of ~50.

https://github.com/matplotlib/matplotlib/blob/cb487f3c077c93...

I expect the Rust code is faster because that code is waaaaay more complicated than what they probably need (https://stackoverflow.com/q/11716268/265521) - e.g. it handles stroke widths.

IMO this result is not very interesting.

By @wcrossbow - 6 months
Here is a much better blog post on the topic of optimizing geometrical operations with Rust. If I remember correctly some of the commenters of the HN thread even delivered better "pure" Python versions than the optimized Rust proposed by really leveraging Numpy.

- https://ohadravid.github.io/posts/2023-03-rusty-python/

- https://news.ycombinator.com/item?id=35367520

By @phkahler - 6 months
My background is very different so this post made me sad. Doing tons of point-in-polygon tests is a common problem and should use spatial index. If we're testing N points against P polygons, the problem should scale O(N * log(P)) not O(N * P). The author was on the right track with bounding box tests, but that's more of a micro optimization. You want something like nested bounding boxes where each level contains 2 or more smaller ones, this is how you go from P to log(P) time - by never testing most of the polygons at all. I have my own preferred spatial index, but all of them tend toward this complexity improvement.
By @mulmboy - 6 months
Rust approach may be great, strikes me as potentially a natural fit for a spatial index though - curious if spatial index would be practical and how it compares. Geopandas from memory has at least some support for spatial indexes. I believe geopandas if used naively will also do spatial operations quite naively (slowly)
By @analog31 - 6 months
Not an expert here, but something I didn't grasp from the article was, how much of the improvement came from running multiple cores, and how much from Rust being faster than Python? Granted, those two things may be intertwined.
By @wcrossbow - 6 months
I was taken by surprise too by the false dichotomy presented:

> 1. Try breaking the data into chunks, and then using multi-processing (ugly in Python) to leverage a more powerful cloud virtual machine, sticking with matplotlib

> 2. Write a very small native custom library to do the math we want, using threads

There are a many more ways you could try to go about this, off the top of my head: numba, pypy (or other alternative python), jax/torch/tensorflow, multiprocessing, joblib.

By @tc4v - 6 months
so a 5x speedup from using 8x threads? I am not convinced this couldn't be achieved with numpy+multiprocessing ( or even maybe threading sinc numpy releases the GIL I think)
By @nisa - 6 months
Wouldn't it be easier to port the code to Julia? Had the chance to use it for some control theory problems and it feels much more modern and sane than Python - especially under the hood.

On the other hand multiple dispatch and the way imports are handled left me quite confused as a beginner.

But it's powerful and also very fast, especially for plots.

By @sn9 - 6 months
I really want to know what the optimizations might have looked like had they used a profiler like scalene [0] to find where the unnecessary copying was happening.

[0] https://github.com/plasma-umass/scalene

By @selimnairb - 6 months
I would have tried Numpy and GEOS before jumping to Rust.
By @CraigJPerry - 6 months
Is there a code link I’m missing?

My immediate reaction was to try just annotating ctypes and see what performance delta exists

By @rini17 - 6 months
Have you tried pypy? Might be interesting comparison.
By @bdjsiqoocwk - 6 months
OT I heard recently that rust isn't actually used in Firefox which of the original use case? What's the story there?
By @pjmlp - 6 months
The amount of compute cycles and development time that could have been saved for every single time, teams have to deal with scraping scripting code for something that actually performs.

Several options are available with JIT and AOT toolchains, alongside REPL programming environments.