October 24th, 2024

Zero or Sign Extend

The blog discusses challenges in handling bit-packed formats, critiques bit-shifting methods for sign extension, and proposes a refined function using bitwise operations for efficient handling of signed and unsigned values.

Read original article

The blog post discusses the challenges of handling bit-packed formats with varying integer sizes and signedness. It highlights the complexity of sign-extending narrow types, particularly when using two's complement representation. The author critiques common methods that rely on bit-shifting, which can be problematic depending on the programming language's standards. Instead, a more elegant solution is proposed that avoids shifting altogether by manipulating the place values of bits directly. The author presents a function for sign-extending that uses bitwise operations to convert unsigned values to signed without assumptions about integer width or two's complement representation. A further refined version of the function is introduced, which simplifies the process even more by using XOR operations. This approach allows for a unified method to handle both signed and unsigned values without explicit checks, making the code cleaner and more efficient.

- The blog addresses the complexities of sign-extending narrow integer types in bit-packed formats.

- It critiques traditional methods that rely on bit-shifting for sign extension.

- A new method is proposed that uses bitwise operations to avoid shifting.

- The refined function allows for handling both signed and unsigned values seamlessly.

- The approach enhances code clarity and efficiency in dealing with varying bit widths.

The Byte Order Fiasco

Handling endianness in C/C++ programming poses challenges, emphasizing correct integer deserialization to prevent undefined behavior. Adherence to the C standard is crucial to avoid unexpected compiler optimizations. Code examples demonstrate proper deserialization techniques using masking and shifting for system compatibility. Mastery of these concepts is vital for robust C code, despite available APIs for byte swapping.

Another variable-length integer encoding

The article presents two encoding schemes for small integers in binary formats: metric varint and imperial varint, highlighting their efficiency, advantages, and the use of zig-zag encoding for signed integers.

Packed structs in Zig make bit/flag sets trivial

Zig's packed structs efficiently manage flag sets using single bits for booleans, ensuring compile-time validation and reducing errors. The upcoming Zig 0.10 will further simplify their usage with explicit backing integers.

Needed-Bits Optimizations in Guile

The article details optimizations in the Guile programming language, focusing on needed-bits analysis for numeric operations, enhancing performance through unboxing, and fixing a bug in variable definition tracking.

C Until It Is No Longer C

Artyom Bologov's blog post addresses the complexities of C programming, proposing enhancements like booleans, custom type aliases, and type inference to improve readability and usability for programmers.

6 comments

By @jchw - 6 months

Of course doing the undefined thing works on almost any platform except DS9k, but that last formulation is quite elegant. It's a bit like byteswapping in that it's fairly simple to do but it's even simpler to not do by just never relying on the machine endianness.

By @eqvinox - 6 months

> ... this explicitly relies on shifting something into the sign bit, which depending on the exact flavor of language standard you’re using is either not allowed or at best fairly recently ..

An unsigned has no sign bit, so the left shift just needs to be unsigned to make it "technically correct".

(Remember to not use smaller than int types though, due to integer promotion issues)

By @notepad0x90 - 6 months

There are explicit instructions on x64 and aarch64 for sign and zero extensions. There are also common patterns like 'xor r64, r64' to clear out a reg (and the zero-register arm variant). Why are there no higher level language abstractions for these types of patterns? Or maybe there are and I'm just unaware.

I would really like to see single operand operators, similar to 'i++'. '!!i' to do 'i~=i', '<<i' to do 'i=i<<1'.

The 'rep' instructions are also nice: https://www.felixcloutier.com/x86/rep:repe:repz:repne:repnz

Imagine doing something like this in C or Rust: 'int i=10;rep i printf("called %u times", i);' where rep would store the value of i in rcx, sets it to zero and stores in rax and jmp's to whatever function or codeblock you specified (could be inline code, or lambda expression), rcx (i) times, passing 'i''s current value optionally to the target code block. It would essentially be a shorthand form of 'for(int i=0;i<10;i++){printf("called %u times",i);}' except it's easier to use for simpler constructs like 'rep 8 <<i;' (just an example, you can just do 'i = i << 8;') if you combine it with my earlier proposed left shift operator.

By @Neywiny - 6 months

This is the perfect spot to use a bitfield. You can tell it signed or unsigned, and the compiler will deal with it all and optimize. No bit ops to get wrong or maintain. Very readable and scalable.

By @IshKebab - 6 months

Eh the author's suggestions only seem better because C++ is insane.

The last one is definitely nice though!

Zero or Sign Extend