June 30th, 2024

The Byte Order Fiasco

Handling endianness in C/C++ programming poses challenges, emphasizing correct integer deserialization to prevent undefined behavior. Adherence to the C standard is crucial to avoid unexpected compiler optimizations. Code examples demonstrate proper deserialization techniques using masking and shifting for system compatibility. Mastery of these concepts is vital for robust C code, despite available APIs for byte swapping.

Read original article

The article discusses the challenges of handling endianness in C/C++ programming, focusing on deserializing integers correctly. It highlights the risks of undefined behavior and the importance of adhering to the C standard to avoid unexpected compiler optimizations. The piece provides code examples illustrating the evolution of a proper method for deserialization, emphasizing the use of masking and shifting to ensure compatibility across different systems. It also touches on the serialization of integers and the intricacies of byte swapping. The author suggests that mastering these concepts is essential for writing robust C code, despite the existence of various APIs for byte swapping. Overall, the article serves as a comprehensive guide to navigating the complexities of endianness in C programming, offering insights into best practices and pitfalls to avoid.

How GCC and Clang handle statically known undefined behaviour

Discussion on compilers handling statically known undefined behavior (UB) in C code reveals insights into optimizations. Compilers like gcc and clang optimize based on undefined language semantics, potentially crashing programs or ignoring problematic code. UB avoidance is crucial for program predictability and security. Compilers differ in handling UB, with gcc and clang showing variations in crash behavior and warnings. LLVM's 'poison' values allow optimizations despite UB, reflecting diverse compiler approaches. Compiler responses to UB are subjective, influenced by developers and user requirements.

Y292B Bug

The Y292B bug is a potential timekeeping issue in Unix systems due to a rollover in the year 292,277,026,596. Solutions involve using dynamic languages or GNU Multiple Precision Arithmetic Library in C, emphasizing the need for kernel-level fixes.

The C Standard charter was updated, now with security principles as well

The ISO/IEC JTC1/SC22/WG14 committee oversees C Standard development, focusing on portability, efficiency, and stability. Collaboration with the C++ committee ensures compatibility. Principles guide feature integration, code efficiency, security, and adaptability.

Learning C++ Memory Model from a Distributed System's Perspective (2021)

The article explores C++ memory model in distributed systems, emphasizing std::memory_order for synchronization. It covers happens-before relationships, release-acquire ordering, and memory_order_seq_cst for total ordering and synchronization across threads.

Weekend projects: getting silly with C

The C programming language's simplicity and expressiveness, despite quirks, influence other languages. Unconventional code structures showcase creativity and flexibility, promoting unique coding practices. Subscription for related content is encouraged.

15 comments

By @o11c - 10 months

When I dealt with this, there were a couple major gotchas:

* Compilers seem to reliably detect byteswap, but are(were) very hit-or-miss with the shift patterns for reading/writing to memory directly, so you still need(ed) an ifdef. I know compilers have improved but there are so many patterns that I'm still paranoid.

* There are a lot of "optimized" headers that actually pessimize by inserting inline assembly that the compiler can't optimize through (in particularly, the compiler can't inline constants and can't choose `movbe` instead of `bswap`), so do not trust any standard API; write your own with memcpy + ifdef'd C-only swapping.

* For speaking wire protocols, generating (struct-based?) code is far better than writing code that mentions offsets directly, which in turn is far better than the `mempcpy`-like code which the link suggests.

By @trealira - 10 months

> Mask, and then shift.

> Repeat it like a mantra. You mask first, to define away potential concerns about signedness. Then you cast if needed. And finally, you can shift. Now we can create the fool-proof version for machines with at least 32-bit ints:

  #define READ32BE(p) \
    (uint32_t)(255 & p[0]) << 24 | (255 & p[1]) << 16 | (255 & p[2]) << 8 | (255 & p[3])

I don't think this works just because she's masking it. I'm pretty sure it's working because she cast (255 & p[0]) to uint32_t, and so all the other operands get promoted to uint32_t as well. I have this working with just casting to unsigned char first:

  #include <stdio.h>
  #include <stdint.h>

  char b[4] = {0x02,0x03,0x04,0x80};
  #define UC(x) ((unsigned char) (x))
  #define READ32BE(p) (uint32_t)UC(p[3]) | UC(p[2]) << 8 | UC(p[1]) << 16 | UC(p[0]) << 24
  int main(void) {
      printf("%08x\n", READ32BE(b));
  }

  $ cc -fsanitize=undefined -g -Os -o /tmp/o endian.c && /tmp/o
  02030480

Edit: actually, it works for me even without casting to uint32_t, without UBSan causing a runtime error, like in the article, so I don't know what's going on.

By @throw0101b - 10 months

> Clang and GCC are reaching for any optimization they can get. Undefined Behavior may be hostile and dangerous, but it's legal. So don't let your code become a casualty.

Perhaps we need a -Wundefined-behaviour so that compilers print out messages when they use those type of 'tricks'. If you see them you can then choose to adjust your code in a way that it follows defined path(s) of the standard(s) in question.

By @IshKebab - 10 months

These days I think the sane option is to just add a static assert that the machine is little endian and move on with your life. Unless you're writing glibc or something do you really care about supporting ancient IBM mainframes?

Also recommending octal is sadistic!

By @el_pollo_diablo - 10 months

Look, it's not that difficult:

  static inline uint32_t load_le32(void const *p)
  {
    unsigned char const *p_ = p;
    _Static_assert(sizeof(uint32_t) == 4, "uint32_t is not 4 bytes long");
    return
      (uint32_t)p_[0] <<  0 |
      (uint32_t)p_[1] <<  8 |
      (uint32_t)p_[2] << 16 |
      (uint32_t)p_[3] << 24;
  }
  
  static inline void store_le32(void *p, uint32_t v)
  {
    unsigned char *p_ = p;
    _Static_assert(sizeof(uint32_t) == 4, "uint32_t is not 4 bytes long");
    p_[0] = (unsigned char)(v >>  0);
    p_[1] = (unsigned char)(v >>  8);
    p_[2] = (unsigned char)(v >> 16);
    p_[3] = (unsigned char)(v >> 24);
  }

All that nonsense about masking with an int must stop. You want a result in uint32_t? Convert to that type before shifting. Done.

Now, the C standard could still get in the way and assign a lower conversion rank to uint32_t than to int, so uint32_t operands would get promoted to int (or unsigned int) before shifting, but those left shifts would still be defined as the result would be representable in the promoted type (as promotion only makes sense if it preserves values, which implies that all values of the type before promotion are representable in the type after promotion).

By @dang - 10 months

Discussed at the time:

The Byte Order Fiasco - https://news.ycombinator.com/item?id=27085952 - May 2021 (364 comments)

By @matheusmoreira - 10 months

The Linux user space API byte swap functions:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

https://github.com/torvalds/linux/blob/master/include/uapi/l...

They are quite similar if not equivalent to the code presented in this article. I assume people far smarter than me have run these things under sanitizers and found no issues. I mean, it's Linux.

  #include <asm/byteorder.h>

By @zzo38computer - 10 months

I always just write the code with shifts in any program that cares about endianness (e.g. when reading/writing files), except for dealing with internet (which is big-endian) (in which case the functions specifically for dealing with endianness with internet, will be used).

However, in addition to big-endian and small-endian, sometimes PDP-endian is used, such as Hamster archive file, which some of my programs use. (The same way of using shifts can be used, like above.)

MMIX is big-endian, and has MOR and MXOR instructions, either of which can be used to deal with endianness (including PDP-endian; these instructions have other uses too). (However, my opinion is that small-endian is better than big-endian, but any one will work.)

By @jowea - 10 months

Can't we just declare __alignof__(uint32_t) on the 'char b[4]' and then freely use bswap without all this madness? Or memcpy the chars into a uint32 array?

https://gcc.gnu.org/onlinedocs/gcc/Alignment.html

By @jwrallie - 10 months

I never thought about doing bit shifts using octal numbers..

I definitely will be doing 010, 020, 030, ... instead of 8, 16, 24, ... from now on!

By @1vuio0pswjnm7 - 10 months

Was the author allowed to fix the Tensorflow example?

By @pizlonator - 10 months

> Modern compiler policies don't even permit code that looks like like that anymore. Your compiler might see that and emit assembly that formats your hard drive with btrfs.

This is total FUD. Some sanitizers might be unhappy with that code, but that's just sanitizers creating problems where there need not be any.

The llvm policy here is that must alias trumps TBAA, so clang will reliably compile the cast of char* to uint32_t* and do what systems programmers expect. If it didn't then too much stuff would break.

By @kazinator - 10 months

Please, do not write shift amounts in octal. Or anything else in octal.

Mask after shifting, so you can use the same mask.

It's less verbose and probably easier to optimize, based on the simple pattern that an expression of the form:

  EXPR & MOD

is being converted to a type which is MOD + 1 bits wide.

From this:

  #define WRITE64BE(P, V)                        \
    ((P)[0] = (0xFF00000000000000 & (V)) >> 070, \
     (P)[1] = (0x00FF000000000000 & (V)) >> 060, \
     (P)[2] = (0x0000FF0000000000 & (V)) >> 050, \
     (P)[3] = (0x000000FF00000000 & (V)) >> 040, \
     (P)[4] = (0x00000000FF000000 & (V)) >> 030, \
     (P)[5] = (0x0000000000FF0000 & (V)) >> 020, \
     (P)[6] = (0x000000000000FF00 & (V)) >> 010, \
     (P)[7] = (0x00000000000000FF & (V)) >> 000, (P) + 8)

To this:

  #define WRITE64BE(P, V)         \
    ((P)[0] = ((V) >> 56) & 0xFF, \
     (P)[1] = ((V) >> 48) & 0xFF, \
     (P)[2] = ((V) >> 40) & 0xFF, \
     (P)[3] = ((V) >> 32) & 0xFF, \
     (P)[4] = ((V) >> 24) & 0xFF, \
     (P)[5] = ((V) >> 16) & 0xFF, \
     (P)[6] = ((V) >>  8) & 0xFF, \
     (P)[7] =  (V)        & 0xFF, (P) + 8)

It doesn't help anyone that the multiples of 8 look nice in octal, because these decimals are burned into hackers' brains. And are you gonna use base 13 for when multiples of 13 look good?

Save the octal for chmod 0777 public-dir.

Anyone who doesn't know this leading zero stupidity perpetrated by C reads 070 as "seventy".

Even 78, 68, 58, 48, ... would be better, and also take 3 characters:

  #define WRITE64BE(P, V)          \
    ((P)[0] = ((V) >> 7*8) & 0xFF, \
     (P)[1] = ((V) >> 6*8) & 0xFF, \
     (P)[2] = ((V) >> 5*8) & 0xFF, \
     (P)[3] = ((V) >> 4*8) & 0xFF, \
     (P)[4] = ((V) >> 3*8) & 0xFF, \
     (P)[5] = ((V) >> 2*8) & 0xFF, \
     (P)[6] = ((V) >>   8) & 0xFF, \
     (P)[7] =  (V)         & 0xFF, (P) + 8)

though we are relying on knowing that * has higher precedence than >>.

Supplementary note: when we have the 0xFF in this position, and P has type "unsigned char", we can remove it. If P is unsigned char, the & 0xFF does nothing, except on machines where chars are wider than 8 bits. These fall into two categories: historic museum hardware your code will never run on and DSP chips. On DSP chips, bytes wider than 8 doesn't mean 9 or 10, but 16 or more. Simply doing & 0xFF may not make the code work. You may have to pack the code into 16 bit bytes in the right byte order. (Been there and done that. E.g. I worked on an ARM platform that had a TeakLite III DSP, where that sort of thing was done in communicating data between the host and the DSP).

So we are actually okay with:

  #define WRITE64BE(P, V) \
    ((P)[0] = (V) >> 56), \
     (P)[1] = (V) >> 48), \
     (P)[2] = (V) >> 40), \
     (P)[3] = (V) >> 32), \
     (P)[4] = (V) >> 24), \
     (P)[5] = (V) >> 16), \
     (P)[6] = (V) >>  8), \
     (P)[7] = (V)       , (P) + 8)

By @akira2501 - 10 months

> Now you don't need to use those APIs because you know the secret.

Was that a desired outcome? The endian.3 and byteorder.3 manual pages make it easy.

The Byte Order Fiasco

Related

How GCC and Clang handle statically known undefined behaviour

Y292B Bug

The C Standard charter was updated, now with security principles as well

Learning C++ Memory Model from a Distributed System's Perspective (2021)

Weekend projects: getting silly with C

Related

How GCC and Clang handle statically known undefined behaviour

Y292B Bug

The C Standard charter was updated, now with security principles as well

Learning C++ Memory Model from a Distributed System's Perspective (2021)

Weekend projects: getting silly with C