September 9th, 2024

Rust for the small things? but what about Python?

The article compares Rust and Python for data engineering, highlighting Python's integration with LLMs and tools like Polars, while noting Rust's speed and safety but greater complexity.

Read original articleLink Icon
Rust for the small things? but what about Python?

The article discusses the potential of using Rust for small-scale data engineering tasks, contrasting it with Python. The author argues that despite the perceived decline of Python, its integration with large language models (LLMs) and tools like Polars suggests a strong future for Python in data engineering. The author expresses a desire to explore Rust for everyday tasks, highlighting its advantages in speed and safety due to strict compiler rules. However, the trade-offs include Rust's verbosity and complexity compared to Python, which often requires less code for similar tasks. The author provides examples of using Rust for operations like counting files in an S3 bucket and processing flat files, noting that while Rust can be efficient, it may not be practical for all data engineering tasks. Ultimately, the author concludes that while Rust offers benefits for reliability and performance, Python remains the more accessible choice for many engineers, especially for simpler tasks.

- The rise of LLMs and tools like Polars supports Python's continued relevance in data engineering.

- Rust offers advantages in speed and safety but is often more verbose than Python.

- The author tested Rust for small data engineering tasks, noting both its efficiency and complexity.

- Despite Rust's benefits, Python is likely to remain the preferred choice for simpler tasks due to its ease of use.

- The article emphasizes that both languages have their place, depending on the specific requirements of the task.

Link Icon 17 comments
By @hnthrowaway6543 - 2 months
> I’ve come to realize now that the demise of Python has been greatly exaggerated

Has it? Because this is literally the first I've heard anyone claim (or claim that others have claimed) Python is on a downward trajectory. If anything it's become the de facto standard language for anyone doing anything, other than low-level hardware programming; from data science glue code to web applications to one-off scripts to backend pipelines to command line tools, it seems like "Python" is the default answer these days.

By @daft_pink - 2 months
To me, a clear benefit of rust over python, and I love python is that once you write something in rust, you have an executable that you can give to someone that doesn’t know how to program.

Python is great for something I will use myself, but not so great for when I want someone else to use my code.

By @alkh - 2 months
The main criteria for me would be the frequency of execution. Say I have a model that will be retrained every quarter via CI/CD or Airflow DAG. Does it make a huge difference for me to have parsing done in 410.71ms in Rust vs 740 ms in Python(a convoluted example,of course)? Probably not. Would 400.71 ms vs 5 min, for example still make a difference? I don't think so either.

It would be a different matter entirely if that piece of code is executed more frequently and is also taking a lot of time as I could save both computing resources and money

By @wrenky - 2 months
Man, I get that python is easy to write but maintaining deployed python code is some of the worst experiences in modern software devlopment.

Less Code != less buggy or more stable code, it just means more implicit code. I contend you spend way more time after release debugging runtime issues or patching random edge cases that are just completely eliminated in typed languages, or deploy/env issues that are eliminated in languages that produce a single binary.

Developer efficiency should include your support time after the writing code.

By @lawn - 2 months
I'd say that writing small stuff in Rust has two major advantages over Python:

1. Dependency management is s godsent compared to Python. With Rust I'm confident that I'll be able to pull the code on a new machine and just do `cargo build` and it will work. I'd like to use a lot of curses to describe Python here.

2. Python works well if you can fit everything in your head. But 5 years for now it's scary to make even smaller changes in a Python codebase. With Rust you'll get much more support from the compiler, wether it be refactoring, squashing bugs, or adding features.

So in the long run I prefer Rust.

By @markus_zhang - 2 months
Good thing about Python:

- Very good, fast language for PoC, even for low level programming projects such as compilers;

- Very easy to setup in a new VM - no weird bash scripts, no complex package download, no need to change a hidden configuration file according to a 10 year old reply of a 20 year old SO post;

- Very easy to run - again no need to touch anything, just python something;

- Very good integration in VSCode;

- Virtual env is a bless for multiple PoCs and is very easy to spin up even for people like me who don't work in terminals very often;

As someone who just want to write some code without understanding a million tools, this is a blessing.

By @Ukv - 2 months
I've found pathlib (in Python's standard library) very convenient for tasks like these. I think the entire second example could just be:

    lines = Path("in.txt").read_text().splitlines()
    trimmed = "\n".join(lines[1: -1])
    Path("out.txt").write_text(trimmed)
Granted the example in the article has advantages (like not loading the full file at once) if you want something more permanent, opposed to a quick script ran once/occasionally.
By @N_A_T_E - 2 months
Codebases tend to grow over time. I am not a fan of python for more complex projects. Some of the patterns I have seen lend to codebases that are hard to follow.
By @melling - 2 months
* Python requires less code

* Does speed and safety matter in every application (probably not)

* Developer efficiency matters

———-

Why can’t have everything in a safe and fast language?

By @joshka - 2 months
In the last example, the call to openoptions can be replaced with the simpler File::create(), which opens the file in write mode, creates it if it doesn't exist and truncates it if it does.

The weird iteration can be replaced with Itertools with_position method and filtering on Position::Middle. The python code counts the total lines by reading the file once and then iterates the file again using the count. This would be possible in the Rust approach too and would look mostly the same.

[1]: https://doc.rust-lang.org/stable/std/fs/struct.File.html#met...

[2]: https://docs.rs/itertools/0.11.0/itertools/trait.Itertools.h...

As with any endeavor, knowing your tools helps most tasks. This is what the example looks like with full error handling and a fairly succinct yet fast approach.

    use std::{
        fs::File,
        io::{BufRead, BufReader, BufWriter, Write},
        time::Instant
    };

    use color_eyre::{eyre::WrapErr, Result};
    use itertools::{Itertools, Position};

    fn main() -> Result<()> {
        color_eyre::install()?;
        let start = Instant::now();

        let path = "foo/bar/baz.txt";
        let tempfile = format!("{path}.tmp");

        let input = File::open(path).wrap_err(format!("Failed to open file: {path}"))?;
        let output = File::create(tempfile).wrap_err("Failed to create output file: {tempfile}")?;

        let reader = BufReader::new(input);
        let mut writer = BufWriter::new(output);
        for (_, line) in reader
            .lines()
            .with_position()
            .filter(|(position, _)| *position == Position::Middle)
        {
            let line = line.wrap_err("Failed to read line")?;
            writeln!(writer, "{line}").wrap_err("Failed to write line")?;
        }

        println!("Elapsed: {:?}", start.elapsed());
        Ok(())
    }
As a Rust dev, I find that marginally easier to grok than the python code, probably due mainly to familiarity. But I can see why the python code would be more easier for a python dev.

My rust specific dev experience is ~18 months. I presume the OP's experience in Python is probably equal or more than this.

By @greener_grass - 2 months
Article compares Rust to Python claiming that whilst Python is indeed slower, it is more productive for developers.

But there are many languages in the world, and there are some that are as productive as Python, yet execute much faster.

By @0cf8612b2e1e - 2 months
Not the point of the article, but the Python version reads the file twice. Once to get the total line count and again to write out the chosen lines. The Rust version proceeds with a single read of the source file.
By @O5vYtytb - 2 months
To me the main benefit of Python is the standard library. You can do a lot with it without any additional packages. Things get tricky when you have a lot of external packages in rust or Python.
By @poulpy123 - 2 months
I just wish I could provide a binary of my python code and call it a day
By @sergeykish - 2 months
aws s3 ls s3://bucket/prefix/ --recursive | wc -l

sed 1d

By @simonw - 2 months
> I find it’s not overly verbose or hard for even one of those lowly Python coders to follow what’s happening

Unnecessarily rude.