October 15th, 2024

Perspectives on Floating Point

The article explains floating point arithmetic's role in approximating real numbers, highlighting round-off errors, the significance of relative error, and the IEEE 754 standard, while noting subtraction's potential for numerical instability.

Read original articleLink Icon
Perspectives on Floating Point

The article discusses the representation of real numbers in computers, focusing on floating point arithmetic. Due to the infinite nature of real numbers and the finite memory of computers, approximation through rounding is necessary, leading to round-off errors. The maximum round-off error is independent of the number's magnitude and is determined by the decimal place at which rounding occurs. The article differentiates between absolute and relative errors, emphasizing that relative error is often more relevant in practical applications. It explains how floating point representation can be understood as fixed point in logarithmic space, allowing for a consistent relative error across various scales. The IEEE 754 standard governs the specifics of floating point representation, including machine precision for different bit formats. The article also highlights the concept of relative conditioning in operations, noting that while addition, multiplication, and division are well-conditioned, subtraction can lead to catastrophic cancellation, significantly increasing relative error. This phenomenon can cause numerical instability in algorithms, particularly when subtracting nearly equal floating point numbers. The article concludes with key insights into the implications of floating point arithmetic in computational contexts.

- Floating point representation approximates real numbers using rounding, leading to round-off errors.

- Relative error is often more significant than absolute error in practical applications.

- Floating point is effectively fixed point in logarithmic space, allowing for consistent relative error.

- Subtraction can lead to catastrophic cancellation, increasing numerical instability in computations.

- The IEEE 754 standard defines the specifics of floating point representation and machine precision.

Link Icon 5 comments
By @yxhuvud - 6 months
One thing I think would be nice for floating point numbers, is that I'd prefer if there were two separate types - one where NaN and the two infinites are allowed, and one where they are not allowed but instead emit an error. The former would be used by some few mathematicians etc, and the rest of us could use the latter. The upside would be better error handling close to the source of the issue, and better optimizations as the not-normal values throw a wrench into optimizing math.
By @andrepd - 6 months
Posits https://posithub.org/docs/Posits4.pdf are an excellent perspective for an alternative to IEEE floats.
By @kolbusa - 6 months
Not sure why the article does not reference the following paper which is a must read for anyone working with floating point: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h... (original: https://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf).
By @cosignal - 6 months
Very nice graphics in this.
By @rhythane - 6 months
A really interesting review. The idea of relative error makes sense in most cases, but when we need to do subtraction and difference matters, maybe absolute error is actually better.