Rust for the small things? but what about Python?
The article compares Rust and Python for data engineering, highlighting Python's integration with LLMs and tools like Polars, while noting Rust's speed and safety but greater complexity.
Read original articleThe article discusses the potential of using Rust for small-scale data engineering tasks, contrasting it with Python. The author argues that despite the perceived decline of Python, its integration with large language models (LLMs) and tools like Polars suggests a strong future for Python in data engineering. The author expresses a desire to explore Rust for everyday tasks, highlighting its advantages in speed and safety due to strict compiler rules. However, the trade-offs include Rust's verbosity and complexity compared to Python, which often requires less code for similar tasks. The author provides examples of using Rust for operations like counting files in an S3 bucket and processing flat files, noting that while Rust can be efficient, it may not be practical for all data engineering tasks. Ultimately, the author concludes that while Rust offers benefits for reliability and performance, Python remains the more accessible choice for many engineers, especially for simpler tasks.
- The rise of LLMs and tools like Polars supports Python's continued relevance in data engineering.
- Rust offers advantages in speed and safety but is often more verbose than Python.
- The author tested Rust for small data engineering tasks, noting both its efficiency and complexity.
- Despite Rust's benefits, Python is likely to remain the preferred choice for simpler tasks due to its ease of use.
- The article emphasizes that both languages have their place, depending on the specific requirements of the task.
Related
The Python linter Ruff is a win for open source – and Rust
The Python linter Ruff is praised for its role in open source and Rust programming. The article emphasizes data transparency in AI projects, expert contributions, and the tech industry's evolution towards open source, AI, and data management.
I Hope Rust Does Not Oxidize Everything
The author expresses concerns about Rust's widespread adoption in programming, citing issues with syntax, async features, complexity, and long compile times. They advocate for language diversity to prevent monoculture, contrasting Rust with their language Yao.
Investing in Rust
Investing in Rust programming language can enhance cybersecurity by preventing memory-related vulnerabilities. Challenges in adoption include integration issues and skill set mismatches, suggesting U.S. policy interventions for promotion.
Programming Languages 2024
IEEE Spectrum's 2024 programming language rankings highlight Python's dominance, SQL's employer preference, and the rise of Typescript and Rust, while Apex and Solidity emerge as new contenders in technology.
From Julia to Rust
The article outlines the author's transition from Julia to Rust, highlighting Rust's memory safety features, design philosophies, and providing resources for learning, while comparing code examples to illustrate syntax differences.
Has it? Because this is literally the first I've heard anyone claim (or claim that others have claimed) Python is on a downward trajectory. If anything it's become the de facto standard language for anyone doing anything, other than low-level hardware programming; from data science glue code to web applications to one-off scripts to backend pipelines to command line tools, it seems like "Python" is the default answer these days.
Python is great for something I will use myself, but not so great for when I want someone else to use my code.
It would be a different matter entirely if that piece of code is executed more frequently and is also taking a lot of time as I could save both computing resources and money
Less Code != less buggy or more stable code, it just means more implicit code. I contend you spend way more time after release debugging runtime issues or patching random edge cases that are just completely eliminated in typed languages, or deploy/env issues that are eliminated in languages that produce a single binary.
Developer efficiency should include your support time after the writing code.
1. Dependency management is s godsent compared to Python. With Rust I'm confident that I'll be able to pull the code on a new machine and just do `cargo build` and it will work. I'd like to use a lot of curses to describe Python here.
2. Python works well if you can fit everything in your head. But 5 years for now it's scary to make even smaller changes in a Python codebase. With Rust you'll get much more support from the compiler, wether it be refactoring, squashing bugs, or adding features.
So in the long run I prefer Rust.
- Very good, fast language for PoC, even for low level programming projects such as compilers;
- Very easy to setup in a new VM - no weird bash scripts, no complex package download, no need to change a hidden configuration file according to a 10 year old reply of a 20 year old SO post;
- Very easy to run - again no need to touch anything, just python something;
- Very good integration in VSCode;
- Virtual env is a bless for multiple PoCs and is very easy to spin up even for people like me who don't work in terminals very often;
As someone who just want to write some code without understanding a million tools, this is a blessing.
lines = Path("in.txt").read_text().splitlines()
trimmed = "\n".join(lines[1: -1])
Path("out.txt").write_text(trimmed)
Granted the example in the article has advantages (like not loading the full file at once) if you want something more permanent, opposed to a quick script ran once/occasionally.* Does speed and safety matter in every application (probably not)
* Developer efficiency matters
———-
Why can’t have everything in a safe and fast language?
The weird iteration can be replaced with Itertools with_position method and filtering on Position::Middle. The python code counts the total lines by reading the file once and then iterates the file again using the count. This would be possible in the Rust approach too and would look mostly the same.
[1]: https://doc.rust-lang.org/stable/std/fs/struct.File.html#met...
[2]: https://docs.rs/itertools/0.11.0/itertools/trait.Itertools.h...
As with any endeavor, knowing your tools helps most tasks. This is what the example looks like with full error handling and a fairly succinct yet fast approach.
use std::{
fs::File,
io::{BufRead, BufReader, BufWriter, Write},
time::Instant
};
use color_eyre::{eyre::WrapErr, Result};
use itertools::{Itertools, Position};
fn main() -> Result<()> {
color_eyre::install()?;
let start = Instant::now();
let path = "foo/bar/baz.txt";
let tempfile = format!("{path}.tmp");
let input = File::open(path).wrap_err(format!("Failed to open file: {path}"))?;
let output = File::create(tempfile).wrap_err("Failed to create output file: {tempfile}")?;
let reader = BufReader::new(input);
let mut writer = BufWriter::new(output);
for (_, line) in reader
.lines()
.with_position()
.filter(|(position, _)| *position == Position::Middle)
{
let line = line.wrap_err("Failed to read line")?;
writeln!(writer, "{line}").wrap_err("Failed to write line")?;
}
println!("Elapsed: {:?}", start.elapsed());
Ok(())
}
As a Rust dev, I find that marginally easier to grok than the python code, probably due mainly to familiarity. But I can see why the python code would be more easier for a python dev.My rust specific dev experience is ~18 months. I presume the OP's experience in Python is probably equal or more than this.
But there are many languages in the world, and there are some that are as productive as Python, yet execute much faster.
sed 1d
Unnecessarily rude.
Related
The Python linter Ruff is a win for open source – and Rust
The Python linter Ruff is praised for its role in open source and Rust programming. The article emphasizes data transparency in AI projects, expert contributions, and the tech industry's evolution towards open source, AI, and data management.
I Hope Rust Does Not Oxidize Everything
The author expresses concerns about Rust's widespread adoption in programming, citing issues with syntax, async features, complexity, and long compile times. They advocate for language diversity to prevent monoculture, contrasting Rust with their language Yao.
Investing in Rust
Investing in Rust programming language can enhance cybersecurity by preventing memory-related vulnerabilities. Challenges in adoption include integration issues and skill set mismatches, suggesting U.S. policy interventions for promotion.
Programming Languages 2024
IEEE Spectrum's 2024 programming language rankings highlight Python's dominance, SQL's employer preference, and the rise of Typescript and Rust, while Apex and Solidity emerge as new contenders in technology.
From Julia to Rust
The article outlines the author's transition from Julia to Rust, highlighting Rust's memory safety features, design philosophies, and providing resources for learning, while comparing code examples to illustrate syntax differences.