August 3rd, 2024

The difference between undefined behavior and ill-formed C++ programs

The article explains the difference between undefined behavior and ill-formed programs in C++. It highlights the risks of ill-formed no diagnostic required programs and suggests tools for mitigation.

Read original articleLink Icon
The difference between undefined behavior and ill-formed C++ programs

The article discusses the distinction between undefined behavior (UB) and ill-formed programs in C++. Undefined behavior occurs at runtime when a program executes actions that the language standard prohibits, allowing the program to behave unpredictably, including potentially invalidating previous operations. For example, if a function is called with parameters that avoid triggering UB, the program remains safe. In contrast, an ill-formed program violates syntactical or semantic rules, such as attempting to modify a constant variable. There are two types of ill-formed programs: those that require a diagnostic (the compiler must report an error) and those that do not (ill-formed no diagnostic required, IFNDR). The latter can lead to unpredictable behavior without any compiler warnings, as seen in cases where two translation units disagree on inline function definitions. This can result in erratic program behavior, including memory corruption. The article emphasizes the importance of understanding these concepts to avoid potential pitfalls in C++ programming, particularly with IFNDR, which can lead to fundamentally invalid programs. It also mentions tools like Visual Studio's command line options and defensive coding practices to help identify and mitigate these issues.

Related

How GCC and Clang handle statically known undefined behaviour

How GCC and Clang handle statically known undefined behaviour

Discussion on compilers handling statically known undefined behavior (UB) in C code reveals insights into optimizations. Compilers like gcc and clang optimize based on undefined language semantics, potentially crashing programs or ignoring problematic code. UB avoidance is crucial for program predictability and security. Compilers differ in handling UB, with gcc and clang showing variations in crash behavior and warnings. LLVM's 'poison' values allow optimizations despite UB, reflecting diverse compiler approaches. Compiler responses to UB are subjective, influenced by developers and user requirements.

Weekend projects: getting silly with C

Weekend projects: getting silly with C

The C programming language's simplicity and expressiveness, despite quirks, influence other languages. Unconventional code structures showcase creativity and flexibility, promoting unique coding practices. Subscription for related content is encouraged.

The Byte Order Fiasco

The Byte Order Fiasco

Handling endianness in C/C++ programming poses challenges, emphasizing correct integer deserialization to prevent undefined behavior. Adherence to the C standard is crucial to avoid unexpected compiler optimizations. Code examples demonstrate proper deserialization techniques using masking and shifting for system compatibility. Mastery of these concepts is vital for robust C code, despite available APIs for byte swapping.

Don't use null objects for error handling

Don't use null objects for error handling

The article critiques using null objects for error handling in programming, arguing it misleads users and propagates errors. It advocates for immediate error handling and context-based strategies instead.

C Isn't a Programming Language Anymore (2022)

C Isn't a Programming Language Anymore (2022)

The article examines the shift in perception of C from a programming language to a protocol, highlighting challenges it poses for interoperability with modern languages like Rust and Swift.

Link Icon 4 comments
By @rileymat2 - 6 months
> Avoiding code paths with undefined behavior is something you do all the time.

> // Check the pointer before using it

> if (p != nullptr) p->DoSomething();

I love this example.

By @layer8 - 6 months
> Undefined behavior (commonly abbreviated UB) is a runtime concept. If a program does something which the language specified as “a program isn’t allowed to do that”, then the behavior at runtime is undefined

Importantly, UB is a runtime condition in the general case, in the sense that it cannot be statically determined (again, in the general case). It may depend on input data, or detecting it may amount to solving the halting problem.

A consequence of that is that UB cannot be caught by static tools in the general case without changing the language so that a large class of previously valid programs (i.e. not containing UB) become invalid.

By @TillE - 6 months
> Visual Studio has an unofficial command line option to help identify certain classes of IFNDR

I'm sure there's a good reason why this is hard, but I'm a little surprised that this isn't caught by static analysis. Sure enough, I can't get MSVC Code Analysis to complain about the example with different inline functions.

By @Bjartr - 6 months
> However, if your program avoids the code paths which trigger undefined behavior, then you are safe.

This seems incorrect as demonstrated by the other undefined behavior story I read on HN today[1], the tl;dr of which, as I understood it, is since UB is not allowed, the compiler can elide checks that would protect against UB for the sake of optimization since a correct program wouldn't have caused the UB in the first place and the compiler doesn't have to respect the semantics of incorrect programs.

[1] https://news.ycombinator.com/item?id=41146860