September 23rd, 2024

Overview of cross-architecture portability problems

Michał Górny discusses cross-architecture portability challenges between 32-bit and 64-bit systems, highlighting issues with memory allocation, file size limitations, and the Y2K38 problem affecting multiple programming languages.

Read original article

Overview of cross-architecture portability problems

Michał Górny discusses the challenges of cross-architecture portability, particularly between 32-bit and 64-bit systems. He highlights that while many assume that differences in integer type sizes are the main issue, the reality is more complex. The primary difference lies in the size of the `long` type, which is 32-bit on 32-bit systems and 64-bit on 64-bit systems. This can lead to problems when casting pointers to integers. Address space limitations are another significant issue, as 32-bit architectures can only address up to 4 GiB of memory, which can cause allocation failures in memory-intensive applications. Additionally, 32-bit programs face limitations with file sizes and inode numbers due to the use of 32-bit types, which can prevent the opening of files larger than 2 GiB. The impending Y2K38 problem is also a concern, as 32-bit `time_t` types cannot represent dates beyond 2038. Górny notes that while modern libraries can address some of these issues, the transition can introduce compatibility risks. He concludes that these portability issues are not exclusive to C, as they can affect other programming languages like Python, particularly regarding memory allocation and timestamp limitations.

- Cross-architecture portability issues arise mainly between 32-bit and 64-bit systems.

- Address space limitations on 32-bit systems restrict memory allocation to 4 GiB.

- 32-bit programs cannot handle files larger than 2 GiB due to 32-bit file-related types.

- The Y2K38 problem affects 32-bit systems, limiting date representation to 2038.

- Portability issues are not exclusive to C and can impact other languages like Python.

Y292B Bug

The Y292B bug is a potential timekeeping issue in Unix systems due to a rollover in the year 292,277,026,596. Solutions involve using dynamic languages or GNU Multiple Precision Arithmetic Library in C, emphasizing the need for kernel-level fixes.

The Byte Order Fiasco

Handling endianness in C/C++ programming poses challenges, emphasizing correct integer deserialization to prevent undefined behavior. Adherence to the C standard is crucial to avoid unexpected compiler optimizations. Code examples demonstrate proper deserialization techniques using masking and shifting for system compatibility. Mastery of these concepts is vital for robust C code, despite available APIs for byte swapping.

Malloc() and free() are a bad API (2022)

The post delves into malloc() and free() limitations in C, proposing a new interface with allocate(), deallocate(), and try_expand(). It discusses C++ improvements and emphasizes the significance of a robust API.

C Isn't a Programming Language Anymore (2022)

The article examines the shift in perception of C from a programming language to a protocol, highlighting challenges it poses for interoperability with modern languages like Rust and Swift.

A Time Consuming Pitfall for 32-Bit Applications on AArch64

Running 32-bit applications on 64-bit AArch64 Linux requires separate GCC toolchains and proper configuration to avoid performance issues, particularly ensuring vDSO support for efficient system calls.

9 comments

By @denotational - 7 months

Missed my favourite one: differences in the memory consistency model.

If you’re using stdatomic and the like correctly, then the library and compiler writers have your back, but if you aren’t (e.g. using relaxed ordering when acquire/release is required), or if you’re rolling your own synchronisation primitives (don’t do this unless you know what you’re doing!), then you’re going to have a bad day.

By @johnklos - 7 months

One of the larger problems is purely social. Some people unnecessarily resist the idea that code can run on something other than x86 (and perhaps now Arm).

Interestingly, some apologists say that maintaining portability in code is a hinderance that costs time and money, as though the profit of a company or the productivity of a programmer will be dramatically affected if they need to program without making assumptions about the underlying architecture. In reality, writing without those assumptions usually makes code better, with fewer edge cases.

It'd be nice if people wouldn't go out of their way to fight against portability, but some people do :(

By @hanikesn - 7 months

Looks like non 4k page sizes are missing, which tripped out some software running on asahi Linux.

By @magicalhippo - 7 months

One I recall, working on a C++ program that we distributed to Windows, Linux and PowerPC OSX at the time, was how some platforms had memory zero-initialized by the OS memory manager, and some didn't.

Our code didn't mean to take advantage of this, but it sometimes meant buggy code would appear fine on one platform as pointers in structures would be zeroed out, but crash on others where they weren't.

As I recall, it was mostly that and the endianess that caused the most grief. Not that there was many issues at all since we used Qt and Boost...

By @Archit3ch - 7 months

The fun of discovering size_t is defined differently on Apple Silicon.

By @AStonesThrow - 7 months

In my misspent college years, I was coding game servers in C. A new iteration came about, and the project lead had coded it on a VAX/BSD system, where I had no access.

Under whatever C implementation on VAX/BSD, a NULL pointer dereference returned "0". Yep, you read that right. There was no segfault, no error condition, just a simple nul-for-null!

This was all fine and dandy until he handed it over for me to port to SunOS, Ultrix, NeXT Mach BSD, etc. (Interesting times!)

I honestly didn't find a second implementation or architecture where NULL-dereference was OK. Whatever the standards at the time, we were on the cusp of ANSI-compliance and everyone was adopting gcc. Segmentation faults were "handled" by immediately kicking off 1-255 players, and crashing the server. Not a good user experience!

So my first debugging task, and it went on a long time, was to find every pointer dereference (and it was nul-terminated strings) and wrap them in a conditional "if (ptr != NULL) { ... }"

At the time, I had considered it crazy/lazy to omit those in the first place and code on such an assumption. But C was so cozy with the hardware, and we were just kids. And there was the genesis for the cynical expression "All The World's A VAX!"

By @benchloftbrunch - 7 months

Something not mentioned is that on Windows `long` is always 32 bits (same size as `int`), even on 64-bit architectures.

By @malkia - 7 months

Is it still the case that wasm is still 32-bit?