Small Strings in Rust: smolstr vs. smartstring
The article explores Rust's small string libraries `smolstr` and `smartstring`, demonstrating JSON parsing, a custom memory allocator, and a reporting subcommand for analyzing memory usage and allocations.
Read original articleThe article discusses the implementation of small strings in Rust, focusing on two libraries: `smolstr` and `smartstring`. It begins with setting up a Rust project and using the `argh` library for command-line argument parsing. The author demonstrates how to parse a JSON dataset of the 1000 largest US cities using `serde` and `serde_json`, extracting only the city and state names. The article then shifts to profiling memory allocations by creating a custom tracing allocator that logs memory allocation and deallocation events. The allocator is designed to avoid recursive calls that could lead to stack overflow. The author also introduces a mechanism to activate and deactivate the allocator for specific workloads. Finally, a new subcommand is added to generate reports on memory usage, total allocations, and deallocations, utilizing additional libraries for formatting and graphing the data. The article emphasizes the importance of careful memory management in Rust and provides practical examples of how to implement and analyze memory usage in applications.
- The article compares two Rust libraries for small strings: `smolstr` and `smartstring`.
- It demonstrates parsing a JSON dataset of US cities using `serde` and `serde_json`.
- A custom tracing allocator is implemented to log memory allocation events.
- The allocator includes an activation mechanism to control when logging occurs.
- A reporting subcommand is added to analyze memory usage and allocation statistics.
Related
Malloc() and free() are a bad API (2022)
The post delves into malloc() and free() limitations in C, proposing a new interface with allocate(), deallocate(), and try_expand(). It discusses C++ improvements and emphasizes the significance of a robust API.
Crafting Interpreters with Rust: On Garbage Collection
Tung Le Vo discusses implementing a garbage collector for the Lox programming language using Rust, addressing memory leaks, the mark-and-sweep algorithm, and challenges posed by Rust's ownership model.
Phantom Menance: memory leak that wasn't there
The author's investigation into a perceived memory leak in a Rust application revealed it was a misunderstanding of misleading Grafana metrics, emphasizing the importance of accurate metric calculation in debugging.
Mimalloc Cigarette: Losing one week of my life catching a memory leak (Rust)
The article details a memory leak issue in a pricing engine using mimalloc, revealing that its internal bookkeeping caused memory retention. Restructuring to a single-threaded approach improved memory management.
I sped up serde_json strings by 20%
The author improved the performance of the Rust serialization framework serde_json by optimizing error handling, utilizing the memchr crate, and implementing a single-pass algorithm, leading to a successful contribution.
- Several users mention new libraries like CompactStr and byteyarn, highlighting their unique features and capabilities compared to smolstr and smartstring.
- There is a discussion on the implementation details of smolstr, particularly its use of enums for optimization without unsafe code.
- Users share personal experiences and experiments with string allocations, noting performance differences and challenges.
- Some comments address the evolution of small string types and their trade-offs in terms of performance and memory usage.
- There is curiosity about the existence of a "string-like interface" in Rust for easier implementation changes.
There's a nice explanation on their readme[2]. Love tricks like this.
[1]: https://github.com/ParkMyCar/compact_str
[2]: https://github.com/ParkMyCar/compact_str?tab=readme-ov-file#...
[0]: https://github.com/rust-analyzer/smol_str/blob/fde86a5c0cb8f...
[1] https://m.youtube.com/watch?time_continue=2658&v=tLX_nvWD738...
- compact_str e.g. depends on the string being valid utf-8, and in turn has larger short strings
- smol_str e.g. is a a enum over `[u8; CAP] | &'static str | Arc<str>` this means it avoids any allocations for static strings and has very fast clones, leading to similar perf. characteristics as string internalization in some use cases (like the use-cases it was designed for). But at the cost of it being immutable only and the heap allocation being slightly larger for the Rc.
Other interesting differences can be the handling of shrinking mutable Strings, do you re-inline it or not? What is better here is highly use-case dependent.
In the end there are many design decisions where there is no clear winner but it's a question of trade off with use-case specific preferences.
fn alloc_string(s: &str) -> NonNull<u8> {
let boxed_slice = s.as_bytes().to_owned().into_boxed_slice();
NonNull::new(Box::into_raw(boxed_slice) as *mut u8).unwrap()
}
Approximately, if stdlib was taking 1µs, but my code was about 14-15µs for large strings. Profiling also did not help. Anyone have any guesses? Here is the full code: https://github.com/avinassh/string-allocRelated
Malloc() and free() are a bad API (2022)
The post delves into malloc() and free() limitations in C, proposing a new interface with allocate(), deallocate(), and try_expand(). It discusses C++ improvements and emphasizes the significance of a robust API.
Crafting Interpreters with Rust: On Garbage Collection
Tung Le Vo discusses implementing a garbage collector for the Lox programming language using Rust, addressing memory leaks, the mark-and-sweep algorithm, and challenges posed by Rust's ownership model.
Phantom Menance: memory leak that wasn't there
The author's investigation into a perceived memory leak in a Rust application revealed it was a misunderstanding of misleading Grafana metrics, emphasizing the importance of accurate metric calculation in debugging.
Mimalloc Cigarette: Losing one week of my life catching a memory leak (Rust)
The article details a memory leak issue in a pricing engine using mimalloc, revealing that its internal bookkeeping caused memory retention. Restructuring to a single-threaded approach improved memory management.
I sped up serde_json strings by 20%
The author improved the performance of the Rust serialization framework serde_json by optimizing error handling, utilizing the memchr crate, and implementing a single-pass algorithm, leading to a successful contribution.