August 22nd, 2024

I sped up serde_json strings by 20%

The author improved the performance of the Rust serialization framework serde_json by optimizing error handling, utilizing the memchr crate, and implementing a single-pass algorithm, leading to a successful contribution.

Read original article

In a recent blog post, the author discusses optimizing the performance of the Rust serialization framework, serde_json, which is widely used for JSON parsing. The author identified that the error handling path in serde_json was significantly slower than the success path, prompting a performance review. By profiling the code, they pinpointed a function responsible for converting string indices to line/column positions as a bottleneck. The author replaced this function with a more efficient implementation using the memchr crate, which utilizes SIMD (Single Instruction, Multiple Data) for faster string searching and counting. This change resulted in a notable improvement in error path performance. Following this, the author explored further optimizations, particularly in string parsing, and learned that a two-pass algorithm was less efficient than a single-pass approach. They ultimately devised a method to check for escape characters and control codes in one pass using bitwise operations, which could simulate SIMD behavior. This approach aimed to maintain code simplicity while enhancing performance. The author’s contributions were well-received, with their pull request being quickly merged into the serde_json codebase, marking a successful first contribution to the project.

- The author optimized serde_json's error handling, improving performance by over 20%.

- Profiling revealed significant slowdowns in error path processing compared to success path.

- The memchr crate was utilized to enhance string searching efficiency.

- A single-pass algorithm was found to be more effective than a two-pass approach for string parsing.

- The author's pull request for optimizations was accepted, highlighting the collaborative nature of open-source development.

Spending too much time optimizing for loops

Researcher Octave Larose shared insights on optimizing Rust interpreters, focusing on improving performance for the SOM language. By enhancing loop handling and addressing challenges, significant speedups were achieved, balancing code elegance with efficiency.

Spending too much time optimizing for loops

Researcher Octave Larose discussed optimizing Rust interpreters, focusing on improving performance for the SOM language. They highlighted enhancing loop efficiency through bytecode and primitives, addressing challenges like Rust limitations and complex designs. Despite performance gains, trade-offs between efficiency and code elegance persist.

Debugging a rustc segfault on Illumos

The author debugged a segmentation fault in the Rust compiler on illumos while compiling `cranelift-codegen`, using various tools and collaborative sessions to analyze the issue within the parser.

Reflection-based JSON in C++ at Gigabytes per Second

Daniel Lemire's blog highlights advancements in JSON processing in C++, particularly with the upcoming C++26 standard introducing reflection, which simplifies serialization and boosts performance to 1900 MB/s.

Reflection-based JSON in C++ at Gigabytes per Second

Daniel Lemire's blog highlights challenges in JSON processing in C++, noting that C++26's reflection capabilities will automate serialization, improve performance, and enhance competitiveness with other languages. Benchmarks show significant speed improvements.

7 comments

By @Sytten - 9 months

The utf-8 tricks make me very nervous since I have seen too many attacks with parser confusion. I for with serde for correctness not speed. I hope this was fuzzed all the way with a bunch of invalid utf-8 strings.

By @spense - 9 months

awesome that serde moves so quickly. i just ran across simdutf8 and realized the pr for simd-enabled uft8 parsing is coming up on 5 years:

https://github.com/rust-lang/rust/issues/68455

By @s_Hogg - 9 months

Very strong jart feel about this person's blog, that was a nice read

> We would need to reinvent the wheel, but this is quite neat if you think about it.

Is this real or ironic though? I read it and started laughing at the writer but the rest of the page seems quite heavy on self-deprecation

By @zorked - 9 months

> Teaching to _think_ is just as important as teaching to code, but this is seldom done

Oh, the arrogance of thinking that the other person doesn't think.

By @zadokshi - 9 months

Serde json has 3gb of dependencies once you do a build for debug and a build for release. Use serde on a few active projects and you run out of disk space. I don’t know why json parsing needs 3gb of dependencies.

I’m all for code reuse but Serde for json is a bit of a dogs breakfast when it comes to dependencies. all you need is an exploit in on of those dependencies and half of the rust ecosystem is vulnerable.

Rust should have Jason built in.

Spending too much time optimizing for loops

Debugging a rustc segfault on Illumos

The author debugged a segmentation fault in the Rust compiler on illumos while compiling `cranelift-codegen`, using various tools and collaborative sessions to analyze the issue within the parser.

I sped up serde_json strings by 20%

Related

Spending too much time optimizing for loops

Spending too much time optimizing for loops

Debugging a rustc segfault on Illumos

Reflection-based JSON in C++ at Gigabytes per Second

Reflection-based JSON in C++ at Gigabytes per Second

Related

Spending too much time optimizing for loops

Spending too much time optimizing for loops

Debugging a rustc segfault on Illumos

Reflection-based JSON in C++ at Gigabytes per Second

Reflection-based JSON in C++ at Gigabytes per Second