The Byte Order Fallacy
The article argues that native byte order is irrelevant for programmers, advocating for code that handles data streams independently, avoiding complexity and bugs from practices like byte swapping.
Read original articleThe article discusses the misconception surrounding byte order in programming, particularly in relation to data processing. It argues that the native byte order of a computer is largely irrelevant for most programmers, as the focus should be on the byte order of the data being processed. The author emphasizes that code should be written to handle data streams independently of the machine's byte order, using simple extraction methods that are portable across different architectures. The article critiques common practices that involve byte swapping and conditional compilation based on byte order, highlighting that such approaches often lead to unnecessary complexity and bugs. The author cites examples, including issues with Adobe Photoshop, to illustrate how byte order mismanagement can complicate software functionality. Ultimately, the piece advocates for a clearer understanding of byte order, suggesting that programmers should avoid overcomplicating their code with unnecessary byte order checks.
- The native byte order of a computer is generally irrelevant for data processing.
- Code should be designed to handle data streams independently of machine byte order.
- Common practices like byte swapping can introduce complexity and bugs.
- Proper handling of byte order can lead to simpler, more portable code.
- Mismanagement of byte order can cause significant software issues, as illustrated by real-world examples.
Related
The Byte Order Fiasco
Handling endianness in C/C++ programming poses challenges, emphasizing correct integer deserialization to prevent undefined behavior. Adherence to the C standard is crucial to avoid unexpected compiler optimizations. Code examples demonstrate proper deserialization techniques using masking and shifting for system compatibility. Mastery of these concepts is vital for robust C code, despite available APIs for byte swapping.
Beyond Clean Code
The article explores software optimization and "clean code," emphasizing readability versus performance. It critiques the belief that clean code equals bad code, highlighting the balance needed in software development.
Good programmers worry about data structures and their relationships
Good programmers prioritize data structures over code, as they enhance maintainability and reliability. Starting with data design simplifies complexity, aligning with Unix philosophy and aiding senior engineers in system documentation.
Do low-level optimizations matter? Faster quicksort with cmov (2020)
The article emphasizes the importance of low-level optimizations in sorting algorithms, highlighting how modern CPU features and minimizing conditional branches can enhance performance, particularly with the new `swap_if` primitive.
Byte Ordering: On Holy Wars and a Plea for Peace (1980)
The document explains floating-point number storage, emphasizing consistent bit order's importance. It discusses Little-Endian and Big-Endian systems, their implications for data processing, and advocates for unified data representation to reduce compatibility issues.
- mmap/io_uring/drivers and additional "zero-copy" code implementations require consideration about byte order.
- filesystems, databases, network applications can be high throughput and will certainly benefit from being zero-copy (with benefits anywhere from +1% to +2000% in performance.)
This is absolutely not "premature optimization." If you're a C/C++ engineer, you should know off the top of your head how many cycles syscalls & memcpys cost. (Spoiler: They're slow.) You should evaluate your performance requirements and decide if you need to eliminate that overhead. For certain applications, if you do not meet the performance requirements, you cannot ship.
"htonl, htons, ntohl, ntohs - convert values between host and network byte order"
The cheapest big-endian modern device is a Raspberry Pi running a NetBSD "eb" release, for those who want to test their code.
He even has an example where he just pushes the problem off to someone else "if the people at Adobe wrote proper code to encode and decode their files", yeah hope they weren't ignoring byte order issues.
Original thread w/104 comments:
And do not define any data format to be big endian anymore. Deine it as little endian (do not leave it undefined) and everyone will be happy.
Given a reader (file, network, buffers can all be turned into readers), you can call readInt. It takes the type you want, and the endianess of the encoding. It's easy to write, self documents, and it's highly efficient.
Rust, for example has from_be_bytes(), from_le_bytes() and from_ne_bytes() methods for the number primitives u16, i16, u32, and so on. They all take a byte array of the correct length and interpret them as big, little and native endian and convert them to the number.
The first two methods work fine on all architectures, and that's what this article is about.
The third method, however, is architecture-dependent and should not be used for network data, because it would work differently and that's what you don't want. In fact, let me cite this part from the documentation. It's very polite but true.
> As the target platform’s native endianness is used, portable code likely wants to use from_be_bytes or from_le_bytes, as appropriate instead.
As a non-SWE, whenever I see checkboxes to enable options that maximize compatibility, I often assume there’s an implicit trade-off, so if it isn’t checked by default, I don’t enable such things unless strictly necessary. I don’t have any solid reason for this, it’s just my intuition. After all, if there were no good reasons not to enable Mac compatibility, why wouldn’t it be the default?
Edit: spelling error with “implicit”
Also, a lot of comments in this thread have nothing to do with the article and appear to be responses to some invisible strawman.
Related
The Byte Order Fiasco
Handling endianness in C/C++ programming poses challenges, emphasizing correct integer deserialization to prevent undefined behavior. Adherence to the C standard is crucial to avoid unexpected compiler optimizations. Code examples demonstrate proper deserialization techniques using masking and shifting for system compatibility. Mastery of these concepts is vital for robust C code, despite available APIs for byte swapping.
Beyond Clean Code
The article explores software optimization and "clean code," emphasizing readability versus performance. It critiques the belief that clean code equals bad code, highlighting the balance needed in software development.
Good programmers worry about data structures and their relationships
Good programmers prioritize data structures over code, as they enhance maintainability and reliability. Starting with data design simplifies complexity, aligning with Unix philosophy and aiding senior engineers in system documentation.
Do low-level optimizations matter? Faster quicksort with cmov (2020)
The article emphasizes the importance of low-level optimizations in sorting algorithms, highlighting how modern CPU features and minimizing conditional branches can enhance performance, particularly with the new `swap_if` primitive.
Byte Ordering: On Holy Wars and a Plea for Peace (1980)
The document explains floating-point number storage, emphasizing consistent bit order's importance. It discusses Little-Endian and Big-Endian systems, their implications for data processing, and advocates for unified data representation to reduce compatibility issues.