September 10th, 2024

Strtod Is Wild

The strtod function in C converts decimal strings to floating-point numbers, facing challenges in accuracy, precision, and memory management. David M. Gay's contributions are significant in its implementation.

Read original article

The article discusses the complexities of the strtod function in the C standard library, which converts decimal strings to floating-point binary numbers. While it appears straightforward, achieving 100% accuracy in this conversion is challenging due to the potential for arbitrarily long input strings and the need to handle various floating-point representations. The strtod function must account for numerous tricky cases, including different ways to express the same number, precision issues, and rounding rules dictated by IEEE standards. The implementation of strtod often relies on arbitrary-precision arithmetic, which can lead to significant memory allocation challenges. The article highlights the contributions of David M. Gay, a key figure in developing modern strtod implementations, and notes that there are alternative implementations with varying accuracy and memory usage. The author reflects on the relevance of these concepts in their own work on arbitrary-precision arithmetic for a game that requires precise tracking of positions during infinite zooming. The discussion emphasizes the importance of understanding the underlying complexities of seemingly simple functions in programming.

- The strtod function converts decimal strings to floating-point binary numbers but is complex to implement accurately.

- It must handle arbitrarily long input and various floating-point representations, leading to potential precision issues.

- David M. Gay's work is foundational to modern strtod implementations, focusing on accuracy and memory management.

- Different implementations of strtod exist, with varying levels of accuracy and memory usage.

- The author relates these concepts to their own work on arbitrary-precision arithmetic in game development.

The Byte Order Fiasco

Handling endianness in C/C++ programming poses challenges, emphasizing correct integer deserialization to prevent undefined behavior. Adherence to the C standard is crucial to avoid unexpected compiler optimizations. Code examples demonstrate proper deserialization techniques using masking and shifting for system compatibility. Mastery of these concepts is vital for robust C code, despite available APIs for byte swapping.

Floating Point Math

Floating point math in computing can cause inaccuracies in decimal calculations due to binary representation limitations. Different programming languages manage this with varying precision, affecting results like 0.1 + 0.2.

Neo Geo Dev: Fixed Point Numbers

The article explains fixed point numbers for the Neo Geo, enabling decimal-like calculations using integers. It discusses their advantages, drawbacks, and practical coding examples for game development, emphasizing precision management.

Strlcpy and how CPUs can defy common sense

The article compares the performance of `strlcpy` in OpenBSD and glibc, revealing glibc's faster execution despite double traversal, emphasizing instruction-level parallelism and advocating for sized strings for efficiency.

Creating invariant floating-point accumulators

The blog addresses challenges in creating invariant floating-point accumulators for the astcenc codec, emphasizing the need for consistency across SIMD instruction sets and the importance of adhering to IEEE754 rules.

4 comments

By @lifthrasiir - 7 months

To be clear, David Gay's dtoa is not a modern implementation of decimal-to-float conversion. There had been several much simpler and performant alternatives, including:

- Google's double-conversion [1], which is best known for introducing the Grisu family of new float-to-decimal algorithms but also has a much less documented float-to-decimal algorithm via successive approximations AFAIK.

- The Eisel-Lemire algorithm [2], which is a Grisu3-like algorithm and returns either correct digits or a much rare fallback signal and currently in the standard libraries of Go and Rust.

- I believe Microsoft's own C Runtime (msvcrt, later ucrt) also has a completely separate code which algorithm is roughly similar to one of above.

These implementations also clearly demonstrate that such conversion only needs a bigint support of the bounded size (~3 KB) and can be done in much smaller code than dtoa.

[1] https://github.com/google/double-conversion

[1] https://lemire.me/blog/2020/03/10/fast-float-parsing-in-prac...

By @Neywiny - 7 months

I think recently I had to make my own because strtod was too large for my embedded system. I just wanted simple xx.yy type strings, so it was as easy as parsing 2 integers, convert, divide, add, done. I often wish these functions had more variants for stuff like that.

By @fph - 7 months

I hate the anti-responsive design of this page: I zoom in (ctrl-plus in Firefox), and the text gets smaller.

By @jancsika - 7 months

Would be cool to see a histogram of all of the web's textual representation of floats based on character length.

In other words-- what percentage of outstanding publicly-accessible data sets require an implementation of strod which can allocate memory on the heap?

Even this article, which talks about millions of digits, could be parsed just fine with a strod that's limited to 64 characters.

Strtod Is Wild