October 24th, 2024

Zero or Sign Extend

The blog discusses challenges in handling bit-packed formats, critiques bit-shifting methods for sign extension, and proposes a refined function using bitwise operations for efficient handling of signed and unsigned values.

Read original articleLink Icon
Zero or Sign Extend

The blog post discusses the challenges of handling bit-packed formats with varying integer sizes and signedness. It highlights the complexity of sign-extending narrow types, particularly when using two's complement representation. The author critiques common methods that rely on bit-shifting, which can be problematic depending on the programming language's standards. Instead, a more elegant solution is proposed that avoids shifting altogether by manipulating the place values of bits directly. The author presents a function for sign-extending that uses bitwise operations to convert unsigned values to signed without assumptions about integer width or two's complement representation. A further refined version of the function is introduced, which simplifies the process even more by using XOR operations. This approach allows for a unified method to handle both signed and unsigned values without explicit checks, making the code cleaner and more efficient.

- The blog addresses the complexities of sign-extending narrow integer types in bit-packed formats.

- It critiques traditional methods that rely on bit-shifting for sign extension.

- A new method is proposed that uses bitwise operations to avoid shifting.

- The refined function allows for handling both signed and unsigned values seamlessly.

- The approach enhances code clarity and efficiency in dealing with varying bit widths.

Link Icon 6 comments
By @jchw - 6 months
Of course doing the undefined thing works on almost any platform except DS9k, but that last formulation is quite elegant. It's a bit like byteswapping in that it's fairly simple to do but it's even simpler to not do by just never relying on the machine endianness.
By @eqvinox - 6 months
> ... this explicitly relies on shifting something into the sign bit, which depending on the exact flavor of language standard you’re using is either not allowed or at best fairly recently ..

An unsigned has no sign bit, so the left shift just needs to be unsigned to make it "technically correct".

(Remember to not use smaller than int types though, due to integer promotion issues)

By @notepad0x90 - 6 months
There are explicit instructions on x64 and aarch64 for sign and zero extensions. There are also common patterns like 'xor r64, r64' to clear out a reg (and the zero-register arm variant). Why are there no higher level language abstractions for these types of patterns? Or maybe there are and I'm just unaware.

I would really like to see single operand operators, similar to 'i++'. '!!i' to do 'i~=i', '<<i' to do 'i=i<<1'.

The 'rep' instructions are also nice: https://www.felixcloutier.com/x86/rep:repe:repz:repne:repnz

Imagine doing something like this in C or Rust: 'int i=10;rep i printf("called %u times", i);' where rep would store the value of i in rcx, sets it to zero and stores in rax and jmp's to whatever function or codeblock you specified (could be inline code, or lambda expression), rcx (i) times, passing 'i''s current value optionally to the target code block. It would essentially be a shorthand form of 'for(int i=0;i<10;i++){printf("called %u times",i);}' except it's easier to use for simpler constructs like 'rep 8 <<i;' (just an example, you can just do 'i = i << 8;') if you combine it with my earlier proposed left shift operator.

By @Neywiny - 6 months
This is the perfect spot to use a bitfield. You can tell it signed or unsigned, and the compiler will deal with it all and optimize. No bit ops to get wrong or maintain. Very readable and scalable.
By @IshKebab - 6 months
Eh the author's suggestions only seem better because C++ is insane.

The last one is definitely nice though!