September 26th, 2024

Lesser known tricks, quirks and features of C

The article explores lesser-known C programming features, including the comma operator, designated initializers, compound literals, and advanced topics like volatile qualifiers and flexible array members, highlighting their potential pitfalls.

Read original articleLink Icon
CuriosityAppreciationFrustration
Lesser known tricks, quirks and features of C

The article discusses various lesser-known features, quirks, and tricks in the C programming language that can confuse even experienced developers. It highlights several unique aspects, such as the use of the comma operator, digraphs and trigraphs for portability, designated initializers for flexible initialization of structures, and compound literals that can act as lvalues. The text also covers advanced topics like bit fields, the volatile and restrict type qualifiers, and flexible array members. Additionally, it mentions format specifiers like %n and %.* for dynamic output formatting. The article emphasizes that while these features can be useful, they may also lead to unexpected behavior if not understood properly. The author provides examples to illustrate these concepts, aiming to enhance the reader's understanding of C's capabilities and potential pitfalls.

- The comma operator allows multiple expressions to be evaluated, with only the last expression's value being used.

- Designated initializers enable non-sequential initialization of structure members.

- Compound literals can be used as lvalues, allowing for direct manipulation of temporary objects.

- The volatile qualifier prevents the compiler from optimizing away accesses to variables that may change unexpectedly.

- Flexible array members allow for dynamic array sizes within structures, enhancing memory management.

Related

Weekend projects: getting silly with C

Weekend projects: getting silly with C

The C programming language's simplicity and expressiveness, despite quirks, influence other languages. Unconventional code structures showcase creativity and flexibility, promoting unique coding practices. Subscription for related content is encouraged.

I _____ hate arrays in C++

I _____ hate arrays in C++

The article explores challenges in using arrays in C++, focusing on array-to-pointer conversion pitfalls, differences from pointers, and practical examples of errors. Caution and awareness are advised for C++ developers.

Initialization in C++ is Seriously Bonkers Just Start With C

Initialization in C++ is Seriously Bonkers Just Start With C

Variable initialization in C++ poses challenges for beginners compared to C. C requires explicit initialization to prevent bugs, while C++ offers default constructors and aggregate initialization. Evolution from pre-C++11 to C++17 introduces list initialization for uniformity. Explicit initialization is recommended for clarity and expected behavior.

Some Tricks from the Scrapscript Compiler

Some Tricks from the Scrapscript Compiler

The Scrapscript compiler implements optimization tricks like immediate objects, small strings, and variants for better performance. It introduces immediate variants and const heap to enhance efficiency without complexity, seeking suggestions for future improvements.

Undefined behavior in C is a reading error (2021)

Undefined behavior in C is a reading error (2021)

Misinterpretations of undefined behavior in C have caused significant semantic issues, risking program stability and making C unsuitable for critical applications. A clearer standard interpretation is needed for reliable programming.

AI: What people are saying
The comments reflect a deep engagement with the nuances of C programming, showcasing both appreciation and critique of its features.
  • Several commenters highlight specific C features, such as flexible array members, designated initializers, and the comma operator, noting their usefulness and potential pitfalls.
  • There is a discussion about the quirks of C syntax and behavior, with some expressing frustration over certain aspects like array pointer decay and multi-character constants.
  • Some users share clever programming tricks and techniques, emphasizing the creativity involved in using C effectively.
  • Concerns are raised about security implications, particularly regarding the use of certain functions like %n in printf.
  • Overall, there is a desire for more comprehensive education on these lesser-known features in C programming.
Link Icon 14 comments
By @fuhsnn - 20 days
My recent favorite is glibc's hack to implement _Static_assert under C99: https://codebrowser.dev/glibc/glibc/misc/sys/cdefs.h.html#56...

It uses the constant expression to create a bitfield of size -1 when failed, and leaves the compiler to error on that as the intended assertion. The actual statement is an extern pointer to a function returning a pointer to an array which has sizeof the aforementioned bitfield struct as its size.

Another one encountered in Toybox is (0 || "foo") being a const expression that evaluates to 1. Apparently the string literal must have been soundly created in data section, so its pointer address is safely assumed to be non-zero.

By @wolfspaw - 20 days
Really liked the trick of defining the struct in the return part of the function.

Array pointers: Array to pointer decay is extremely annoying, if it was implemented as Array to "slice" decay it would be great.

Static array indices in function parameter declarations: awesome, a shame that C++ (and Tiny C) do not support it >/

flexible array member: extremely useful, and now there are good compiler flags for ensuring correct flexible array member usage

X-Macro: nice, no-overhead enum to string name. Didn't know the trick

Combining default, named and positional arguments: Named-arguments/default-arg, C version xD. It would be cool if it was added to C language as a native feature, instead of having to do the struct hiding macro.

Comma operator: really useful, specially in macros

Digraphs, trigraphs and alternative tokens: di/tri/graphs rarely useful, alternatives synonims of iso646.h are awesome, love using and/or instead of &&/||

Designated initializer: super awesome, could not use if you wanted C++ portability. Now C++ supports some part of it.

Compound literals: fantastic, but in C++ it will explode due to stack deallocation in the same line. C++ should fix this and allow the C idiom >/

Bit fields: nice for more control of structs layout

constant string concat: "MultiLine" String, C version xD

Ad hoc struct declaration in the return type of a function: didn't know this trick, "multi value" return, C version xD

Cosmopolitan-libc: incredible project. Already knew of it, its awesome to offer a binary that runs in all S.Os at the same time.

Evaluate sizeof at compile time by causing duplicate case error: ha, nice trick for debugging the size of anything.

By @saagarjha - 20 days
Mentioning %n without explaining that it is overwhelmingly used for exploits is a little reckless IMO.
By @coreyp_1 - 20 days
That's a nice list!

I've been digging into cross-platform (Windows and Linux) C for a while, and it has been fascinating. On top of that, I've been writing a JIT-ted scripting (templating) language, and the ABI differences (not just fastcall vs stdcall vs cdecl) are often not easy to find documentation about.

I've decided that if I ever get to teach a University class on C again, I wanted to cover some of these things that I feel are often left out, and this list is a helpful reference! Thanks!

By @lifthrasiir - 20 days
I hate I know all of them...

> Backslash line splicing

One reason a trigraph was gone is that `??/`, a trigraph spelling for `\`, also acted like `\` in this context.

> Using `&&` and `||` as conditionals

Not only this is uncommon, but chaining them is not always correct because `a && b || c` doesn't equal to `a ? b : c` when `b` evaluates to false.

> Compile time assumption checking using `enum`s

Please use `static_assert` already.

> Matching character classes with `sscanf()`

This can be combined with `*` to ignore certain characters. For example `%*[ \t]` will skip all horizontal whitespaces, unlike a plain whitespace which will also skip newlines.

> Detecting constant expressions

This ultimately comes from C's weird way to say a null pointer, which is defined as any constant expression which type is inferred to be pointer. So a non-constant expression can be distinguished by multiplying it with a known zero constant.

By @guerrilla - 19 days
These are great. Most posts I read with titles similar to this are just the authors revealing that they don't know C very well but this one included some interesting things. I didn't know compund literals were lvalues but if you think about executable formats, it makes a lot of sense.
By @winocm - 19 days
There’s also the use of typedef to help make function declarations.

Such as:

  typedef void fptr_t(int);
  fptr_t foo;
That would effectively declare a function with the prototype: `void foo(int)'. This pattern is used quite a bit in BSD kernels.
By @jonathrg - 20 days
Multi character constants is one of the many things in C that would be nice to use if the language would just choose some well-defined behaviour for it. It doesn't really matter which.
By @golergka - 20 days

    switch (n % 2) {
        case 0:
            do {
                ++i;
        case 1:
                ++i;
            } while (--n > 0);

    }
Someone is really ought to record a "WAT" video about C.
By @johnklos - 20 days
Not sure what happened:

404

File not found

The site configured at this address does not contain the requested file.

By @38 - 19 days

    > int (*ap1)[10] = &arr;
Wow that's garbage syntax. With Go it would be

    var ap1 *[10]int = &arr
By @ranger_danger - 20 days
> quirks and features

Someone is a fan of Doug DeMuro.

By @o11c - 20 days
Bah, those are all well-known.

What value does the following program return?

    int main()
    {
        int *p = 0;

    loop:
        if (p)
            return *p;

        int v = 1;
        p = &v;
        v = 2;
        goto loop;
        return 3;
    }
Also, rather than doing `sizeof` via one error at a time, it's better to just emit them to a char array {'0' + sz/10, '0' + sz%10, '\0'}. Generalizing this to signed numbers of arbitrary size is left as an exercise for the reader.