August 24th, 2024

Small Strings in Rust: smolstr vs. smartstring

The article explores Rust's small string libraries `smolstr` and `smartstring`, demonstrating JSON parsing, a custom memory allocator, and a reporting subcommand for analyzing memory usage and allocations.

Read original articleLink Icon
CuriosityAppreciationDisappointment
Small Strings in Rust: smolstr vs. smartstring

The article discusses the implementation of small strings in Rust, focusing on two libraries: `smolstr` and `smartstring`. It begins with setting up a Rust project and using the `argh` library for command-line argument parsing. The author demonstrates how to parse a JSON dataset of the 1000 largest US cities using `serde` and `serde_json`, extracting only the city and state names. The article then shifts to profiling memory allocations by creating a custom tracing allocator that logs memory allocation and deallocation events. The allocator is designed to avoid recursive calls that could lead to stack overflow. The author also introduces a mechanism to activate and deactivate the allocator for specific workloads. Finally, a new subcommand is added to generate reports on memory usage, total allocations, and deallocations, utilizing additional libraries for formatting and graphing the data. The article emphasizes the importance of careful memory management in Rust and provides practical examples of how to implement and analyze memory usage in applications.

- The article compares two Rust libraries for small strings: `smolstr` and `smartstring`.

- It demonstrates parsing a JSON dataset of US cities using `serde` and `serde_json`.

- A custom tracing allocator is implemented to log memory allocation events.

- The allocator includes an activation mechanism to control when logging occurs.

- A reporting subcommand is added to analyze memory usage and allocation statistics.

AI: What people are saying
The comments on the article about Rust's small string libraries reflect a variety of insights and discussions related to string optimization in Rust.
  • Several users mention new libraries like CompactStr and byteyarn, highlighting their unique features and capabilities compared to smolstr and smartstring.
  • There is a discussion on the implementation details of smolstr, particularly its use of enums for optimization without unsafe code.
  • Users share personal experiences and experiments with string allocations, noting performance differences and challenges.
  • Some comments address the evolution of small string types and their trade-offs in terms of performance and memory usage.
  • There is curiosity about the existence of a "string-like interface" in Rust for easier implementation changes.
Link Icon 12 comments
By @unshavedyak - 8 months
On the note of small strings, Compact String[1] was i believed released after this article and has a nifty trick. Where Smol and Smart can fit 22 and 23 bytes, CompactStr can fit 24! Which is kinda nutty imo, that's the full size of the normal String on the stack.. but packed with actual string data.

There's a nice explanation on their readme[2]. Love tricks like this.

[1]: https://github.com/ParkMyCar/compact_str

[2]: https://github.com/ParkMyCar/compact_str?tab=readme-ov-file#...

By @mastax - 8 months
It’s nice that rustc’s niche optimization lets smolstr be implemented with a simple enum, rather than having to do some unsafe union bit packing[0]. The only concession that had to be made to the compiler is using an enum for the InlineSize value to show that the last 3 bits of that aren’t used.

[0]: https://github.com/rust-analyzer/smol_str/blob/fde86a5c0cb8f...

By @conaclos - 8 months
Readers who like this article may also like a more recent one [0]. It designs a compact string with extra capabilities. The crate was released under the name byteyarn [1]

[0] https://mcyoung.xyz/2023/08/09/yarns/

[1] https://docs.rs/byteyarn/latest/byteyarn/

By @masklinn - 8 months
[2020]. I wouldn't be surprised if the field had changed a fair bit since.
By @abhorrence - 8 months
Sadly it seems like some of the images have broken since it was originally posted. :(
By @weinzierl - 8 months
Here [1] is a nice talk that discusses various options and trade-offs for small string and small vector optimization in Rust.

[1] https://m.youtube.com/watch?time_continue=2658&v=tLX_nvWD738...

By @dathinab - 8 months
on interesting thing to realize is that some small string types go beyond just the basic small len storage optimization

- compact_str e.g. depends on the string being valid utf-8, and in turn has larger short strings

- smol_str e.g. is a a enum over `[u8; CAP] | &'static str | Arc<str>` this means it avoids any allocations for static strings and has very fast clones, leading to similar perf. characteristics as string internalization in some use cases (like the use-cases it was designed for). But at the cost of it being immutable only and the heap allocation being slightly larger for the Rc.

Other interesting differences can be the handling of shrinking mutable Strings, do you re-inline it or not? What is better here is highly use-case dependent.

In the end there are many design decisions where there is no clear winner but it's a question of trade off with use-case specific preferences.

By @codedokode - 8 months
I wonder, is there a "string-like interface" in Rust or one has to rewrite all the code when changing string implementation? Also if you want to change implementation in half of the code, is there automatic convertion between implementations?
By @jtrueb - 8 months
I made humanize-bytes[1] for that formatting reason (1000 vs 1024). Coincidentally, it uses smartstring to avoid allocations.

[1] https://crates.io/crates/humanize-bytes

By @avinassh - 8 months
slightly related: earlier I was experimenting with Rust string allocations, even though my code did almost same as standard library, the heap allocations were taking 10x of time. Relevant code:

  fn alloc_string(s: &str) -> NonNull<u8> {
      let boxed_slice = s.as_bytes().to_owned().into_boxed_slice();
      NonNull::new(Box::into_raw(boxed_slice) as *mut u8).unwrap()
  }

Approximately, if stdlib was taking 1µs, but my code was about 14-15µs for large strings. Profiling also did not help. Anyone have any guesses? Here is the full code: https://github.com/avinassh/string-alloc
By @nextaccountic - 8 months
(2020)