October 29th, 2024

What's New in POSIX 2024

POSIX 2024 updates include error handling for newlines in filenames, C17 compliance, support for NUL separators in commands, and a focus on resource management and interoperability for improved script reliability.

Read original articleLink Icon
HappinessSkepticismFrustration
What's New in POSIX 2024

POSIX 2024 introduces significant updates, particularly in the XCU section, which focuses on the shell command language and standard utilities. One major change addresses the handling of filenames in shell scripts, specifically the issue of newlines in filenames. The new specification encourages utilities to error out when creating or processing filenames that contain newlines, aiming to improve script reliability and interoperability. Additionally, the specification now supports the use of NUL characters as separators in commands like `find` and `xargs`, although this approach has limitations. Another key update is the requirement for C17 compliance, moving away from the previous C89 requirement, which allows developers to utilize more modern C features. This change is expected to facilitate the adoption of contemporary programming practices across various platforms. Furthermore, the specification emphasizes the importance of limits and cooperation within operating systems, enhancing user control over resource management. Overall, POSIX 2024 marks a significant step towards modernizing the standard and improving usability for developers.

- POSIX 2024 introduces error handling for filenames containing newlines in shell scripts.

- The specification now requires C17 compliance, moving away from the outdated C89 standard.

- New support for NUL character separators in commands like `find` and `xargs` is included.

- The updates aim to enhance script reliability and interoperability across systems.

- Emphasis on limits and cooperation within operating systems is reinforced in the new specification.

AI: What people are saying
The comments on the POSIX 2024 updates reflect a mix of enthusiasm and skepticism regarding the changes.
  • Many commenters express excitement about the stricter error handling for newlines in filenames, viewing it as a positive step towards better script reliability.
  • There are concerns about the requirement for C17 compliance, with some questioning the practicality and implications of not having a concrete standard.
  • Several users highlight the challenges of internationalization in shell scripts, noting the added complexity it brings.
  • Some commenters appreciate the updates as a way to improve existing code, contrasting it with past practices that enforced stricter compliance.
  • There is a general curiosity about how these updates will be adopted across different systems and their impact on legacy POSIX systems.
Link Icon 25 comments
By @enriquto - 6 months
> We’ve established that, yes, pathnames can include newlines. We have not established why they can do that. After some deliberation, the Austin Group could not find a single use-case for newlines in pathnames besides breaking naive scripts. Wouldn’t it be nice if the naive scripts were just correct now? Ok, that might be a bit much all at once. We’re heading there though!

Oh my god. This makes me so happy. This is the most lovely think I've read in the world of computing since the unix gods decided that newlines were to be a single character.

The philosophy underlying the sentence "Wouldn’t it be nice if the naive scripts were just correct now?" is incredibly positive. We are surrounded by arrogant jerks who break old code by aggressively enforcing stricter compliance of some stupid rules. But here come these posix heros who do the exact opposite: make old code correct! There is hope in mankind after all.

By @chasil - 6 months

  - find(1p) now supports -print0
  - xargs(1p) now supports the -0 argument
  - newlines in filenames now should throw errors in many utilities
  - a complier implementing the c17 standard is now required
  - ulimit is expanded
  - renice can use relative values
  - a timeout utility has been added
  - make adds support for $^ $+ ::= :::= != ?= +=
  - logger is improved
  - gettext is adopted
  - readlink and realpath are adopted
  - rm now supports -d to remove empty directories and -v for verbose
  - various improvements to printf, sed, test
By @imrejonk - 6 months
This adds `set -o pipefail` to POSIX sh, which causes a whole pipeline to fail (non-zero exit code) if one or more of the commands in the pipeline fail.
By @relistan - 6 months
The history at the beginning of this is not correct. Two examples: the assertion that there was one compatible UNIX prior to United States vs AT&T, the statement that GNU and BSD started that same year. Very, very off.
By @pelorat - 6 months
TIL the POSIX standard is still updated. Does it still suffer from the issues that make Linux break POSIX compatibility in some areas because they consider it a flawed standard?
By @Flimm - 6 months
Yes! Finally! Let's treat filenames with new lines as errors! I'm so delighted with this decision.
By @quotemstr - 6 months
> We’ve established that, yes, pathnames can include newlines. We have not established why they can do that. After some deliberation, the Austin Group could not find a single use-case for newlines in pathnames besides breaking naive scripts. Wouldn’t it be nice if the naive scripts were just correct now?

Finally. Now let's do the rest: https://dwheeler.com/essays/fixing-unix-linux-filenames.html

Filenames should be boring printable normalized UTF-8. I have never, not once, seen a good reason that a filename should be able to contain random binary gobbledygook

By @oguz-ismail - 6 months
Nitpick re: https://blog.toast.cafe/posix2024-xcu#fn:6

    SRC != ls *.c
is fine in a makefile as far as POSIX is concerned, because:

> Applications shall select target names from the set of characters consisting solely of slashes, hyphens, periods, underscores, digits, and alphabetics from the portable character set

By @BobbyTables2 - 6 months
I really hate to say it, but the fretting about newlines used as delimiters after 50 years of misuse …

… makes PowerShell start to look damn good.

By @somat - 6 months
Hopefully nothing, posix is, or at least it should be, a descriptive standard. This is why posix is so terrible, and why posix is so great.

The way I feel posix, and other descriptive standards work best is when they describe what every one is already doing. This is opposed to prescriptive standards which try focus on how the "correct" way to do somthing, prescriptive standards tend to be over engineered and may or may not actually work.

see also: descriptive and prescriptive dictionaries. http://www.englishplus.com/news/news1100.htm

By @donatj - 6 months
To build an internationalized shell script I'll need to compile multiple .mo language files and distribute them along side the script itself.

For shell scripts part of a large system, that's probably fine. For small scripts, that's not very practical. You are not only adding a compilation step, you're also requiring distribution of multiple files. That's a pain.

It just kind of kills the convenience of a simple shell script. I would probably end up writing a makefile to manage all of this and at that point I am only a hop skip and jump away from using a compiled language instead of shell.

By @Netch - 6 months
Filename character set and its interpretation shall be controlled per directory or, at least, per FS. This pertains not only to permitted set like with or without LF, but to collation rules as well (including case insensitivity with cases like Turkish/Crimean/etc. I/ı and İ/i). Also this shall include workarounds for already existing problems: if a directory already contains files I1 and ı1, there shall be a technique to deal with them separately ever with Turkish locale.

But restricting this at syscall level is definite insanity, among with excuses.

By @nh2 - 6 months
> future editions will not require c17, but will simply require whatever C specification version is the most modern and already implemented by major toolchains

Is this really good?

If you can't rely on anything concrete being guaranteed, and it is open to interpretation what "modern" or "major toolchains" are, why have a standard?

By @InfiniteRand - 6 months
I kind-of would like to see a POSIX-strict profile which incorporates commonsense (by commonsense I mean avoiding things that repeatedly over many years have tripped up programmers in frustrating ways) things like no newline in file names. Operating systems (or distributions) or could opt into this profile, and then someone programming on such an operating system could rely on the constraints of the profile and additional facilities could be added on that might need to rely on those constraints. Hopefully, gradually the use of the profile would spread.
By @guerrilla - 6 months
Why was `isascii()` removed?

(Listed in the Sortix article linked in OP.)

By @rurban - 6 months
EILSEQ for \n finally, but why not for unicode confusables? Path names are identifiers, and as such need to be identifiable. Meaning stricter rules than just buffers (not talking about strings).
By @pabs3 - 6 months
Since old-POSIX systems will be in use for some time, I wonder how many things will be able to switch to using the new capabilities. And how many OSes already support all of the new changes.
By @snvzz - 6 months
This is a surprisingly greedy POSIX update.
By @ggm - 6 months
File names with / in them
By @cryptonector - 6 months
> The problem is that pathnames2 (as per section 3.254 of POSIX 2024) are just strings (meaning they can contain any bytes except the NUL character), [...]

Pathnames can neither contain NUL nor '/'.

Re: `find -print0` / `xargs -0`:

> Previous POSIX releases have considered -print0 before, but never ended up adopting it because using a null terminator meant that any utility that would need to process that output would need to have a new option to parse that type of output.

What nonsense. Just add the `-0` or similar options as needed.

> More precisely, this approach does not resolve our original problem. xargs(1p) can’t sort, and therefore we still have to handle that logic separately, unless sort(1p) also grows this support, even after read(1p). This problem continues with every other type of use-case. Importantly, it breaks the interoperability that POSIX was made to uphold.

More nonsense.

> A bunch of C functions3 are now encouraged to report EILSEQ if the last component of a pathname to a file they are to create contains a newline (put differently, they’re to error out instead of creating a filename that contains a newline).

Ok, that's tolerable. Ditto utilities (notice here they were able to make a list of utilities).

By @EdSchouten - 6 months
strlcpy()!
By @johnisgood - 6 months
> Anyway, POSIX 2024 now requires c17, and does not require c89

I wish it would have been c99. What does c17 add exactly, more C++-esque complexity or not? Why was it not c99 (or perhaps even c11) over c17? Genuine questions.