October 15th, 2024

Perspectives on Floating Point

The article explains floating point arithmetic's role in approximating real numbers, highlighting round-off errors, the significance of relative error, and the IEEE 754 standard, while noting subtraction's potential for numerical instability.

Read original article

The article discusses the representation of real numbers in computers, focusing on floating point arithmetic. Due to the infinite nature of real numbers and the finite memory of computers, approximation through rounding is necessary, leading to round-off errors. The maximum round-off error is independent of the number's magnitude and is determined by the decimal place at which rounding occurs. The article differentiates between absolute and relative errors, emphasizing that relative error is often more relevant in practical applications. It explains how floating point representation can be understood as fixed point in logarithmic space, allowing for a consistent relative error across various scales. The IEEE 754 standard governs the specifics of floating point representation, including machine precision for different bit formats. The article also highlights the concept of relative conditioning in operations, noting that while addition, multiplication, and division are well-conditioned, subtraction can lead to catastrophic cancellation, significantly increasing relative error. This phenomenon can cause numerical instability in algorithms, particularly when subtracting nearly equal floating point numbers. The article concludes with key insights into the implications of floating point arithmetic in computational contexts.

- Floating point representation approximates real numbers using rounding, leading to round-off errors.

- Relative error is often more significant than absolute error in practical applications.

- Floating point is effectively fixed point in logarithmic space, allowing for consistent relative error.

- Subtraction can lead to catastrophic cancellation, increasing numerical instability in computations.

- The IEEE 754 standard defines the specifics of floating point representation and machine precision.

Crafting Formulas: Lambdas All the Way Down

The article details arbitrary-precision arithmetic in the Bruijn programming language, focusing on integers, rationals, and the challenges of real numbers, while discussing efficient representations and computational complexities.

Floating Point Math

Floating point math in computing can cause inaccuracies in decimal calculations due to binary representation limitations. Different programming languages manage this with varying precision, affecting results like 0.1 + 0.2.

Strtod Is Wild

The strtod function in C converts decimal strings to floating-point numbers, facing challenges in accuracy, precision, and memory management. David M. Gay's contributions are significant in its implementation.

Fixed-point arithmetic as a replacement for soft floats

Fixed-point arithmetic offers advantages over floating-point in embedded systems, improving performance and reducing binary size. It enables efficient calculations on platforms without floating-point hardware support, enhancing precision and speed.

The big and Small of JavaScript numbers (2017)

JavaScript numbers, as double-precision floating-point values, can lead to unique behaviors. The article discusses safe division, value clamping, and simplifying logic to enhance code reliability and clarity.

5 comments

By @yxhuvud - 6 months

One thing I think would be nice for floating point numbers, is that I'd prefer if there were two separate types - one where NaN and the two infinites are allowed, and one where they are not allowed but instead emit an error. The former would be used by some few mathematicians etc, and the rest of us could use the latter. The upside would be better error handling close to the source of the issue, and better optimizations as the not-normal values throw a wrench into optimizing math.

By @andrepd - 6 months

Posits https://posithub.org/docs/Posits4.pdf are an excellent perspective for an alternative to IEEE floats.

By @kolbusa - 6 months

Not sure why the article does not reference the following paper which is a must read for anyone working with floating point: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h... (original: https://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf).

By @cosignal - 6 months

Very nice graphics in this.

By @rhythane - 6 months

A really interesting review. The idea of relative error makes sense in most cases, but when we need to do subtraction and difference matters, maybe absolute error is actually better.

Perspectives on Floating Point

Related

Crafting Formulas: Lambdas All the Way Down

Floating Point Math

Strtod Is Wild

Fixed-point arithmetic as a replacement for soft floats

The big and Small of JavaScript numbers (2017)

Related

Crafting Formulas: Lambdas All the Way Down

Floating Point Math

Strtod Is Wild

Fixed-point arithmetic as a replacement for soft floats

The big and Small of JavaScript numbers (2017)