June 22nd, 2024

The End-of-Line Story (2004)

The ASCII standard lacks a unique end-of-line character, leading to varied EOL conventions in early systems. ARPAnet researchers standardized CR LF sequence for network communication, influencing protocols like Telnet and FTP. Modern systems handle EOL conversions, but issues like Control M characters can occur. Windows' CR LF simplifies cross-system transfers, contrasting Unix's LF. RFCs mandate CR LF for internet transmission, despite Unix's LF storage. Binary FTP mode preserves EOL characters, and compressed RFC collections accommodate Unix and Windows EOL conventions, crucial for data exchange.

Read original articleLink Icon
The End-of-Line Story (2004)

The ASCII standard for text lacks a unique end-of-line character, defining separate Carriage Return (CR) and Line Feed (LF) movements. Early operating systems varied in their EOL conventions, complicating network communication. To standardize, ARPAnet researchers mandated the CR LF sequence for ASCII text transmission. This convention extended to protocols like Telnet, FTP, and SMTP, overseen by Jon Postel. While modern systems often handle EOL conversions seamlessly, issues can arise, like extra Control M characters or formatting problems. Windows' CR LF convention simplifies cross-system text transfers, unlike Unix's LF. RFCs mandate CR LF line endings for internet transmission, despite Unix storing them with LF. Binary FTP mode preserves source EOL characters, assuming similar systems. Compressed RFC collections cater to Unix (tar.Z) and Windows (.zip) EOL conventions. These details, often unnoticed today, remain crucial for smooth data exchange across diverse systems.

Link Icon 4 comments
By @chasil - 5 months
This goes back further.

Teletype machines needed a delay to move the printing apparatus back to the beginning of a line. The two characters provided that delay.

I never used one of these; I was too young.

https://en.m.wikipedia.org/wiki/Teletype_Model_33

Edit: first google hit:

https://www.revk.uk/2022/02/crlf-has-long-history.html?m=1

By @drdec - 5 months
> This choice was designed to spread the pain equally among all operating systems of the day; each has to translate to and from the CR LF convention when text was transferred across the network.

On the one hand, this seems clever and fair. On the other hand, this is why we can't have nice things.

By @gary_0 - 5 months
Nowadays I see in-flight EOL normalization as Considered Harmful; instead I ensure that routines dealing with text buffers treat both CRLF and LF as the same. This has Just Worked without any issues in the software I've written with this approach. (Occasionally this requires wrapping/replacing library code that only uses the EOL convention of the host system.)

I have encountered numerous issues with EOLs being changed in transit or on disk, so I always make sure to open/transfer files in "binary mode", since both Windows and Linux builds will run the same EOL-agnostic code.

By @ktpsns - 5 months
I haven't seen end-of-line conversion problems (as well as Unicode BOMs) for decades. My guess is that software quality improved these days. MS Notepad was a noteable tool which always mocked around with unix-style line endings. Using dos2unix and unix2dos utilities was something I commonly used in the early 2000s, in particular on dual boot computers. This was also the time where UTF-8 was not yet so widespread, but that is another topic ;-)