June 28th, 2024

What actual purpose do accent characters in ISO-8859-1 and Windows 1252 serve?

Accent characters in ISO-8859-1 and Windows 1252 ensure compatibility with older 7-bit character sets, adding national characters and composed characters. Their inclusion predates modern standards, impacting historical encoding variants. Application software determines their usage.

Read original article

What actual purpose do accent characters in ISO-8859-1 and Windows 1252 serve?

The purpose of accent characters in ISO-8859-1 and Windows 1252 is primarily for compatibility with previous 7-bit character sets, maintaining a 1:1 relation with various older character sets. These characters were defined in earlier standards for international 7-bit character sets to add national characters and create composed characters. While these characters may seem useless on their own for modern purposes, they were essential for historical encoding variants. Operating systems like Win1252 and ISO8859-1 did not directly support multibyte character-composition, as this task was typically handled by application software. The inclusion of these accent characters at specific code points predates the development of these standards, originating from older character sets like ECMA-6. The use of these characters for composition purposes was a result of their presence in 7-bit codes and their importance in national character encoding. Overall, these characters were carried over into ISO-8859-1 and Windows 1252 for compatibility reasons, with their usage being determined by application software rather than the operating system itself.

The End-of-Line Story (2004)

The ASCII standard lacks a unique end-of-line character, leading to varied EOL conventions in early systems. ARPAnet researchers mandated CR LF sequence for standardization across protocols like Telnet, FTP, and SMTP. Modern systems handle EOL translations, but issues like extra characters can still occur. Windows systems use CR LF for EOL, while Unix uses LF. RFCs specify CR LF for Internet transmission, and FTP can preserve EOL characters in binary mode. RFC Editor website adapts EOL conventions for different systems in compressed RFC collections.

The End-of-Line Story (2004)

The ASCII standard lacks a unique end-of-line character, leading to varied EOL conventions in early systems. ARPAnet researchers standardized CR LF sequence for network communication, influencing protocols like Telnet and FTP. Modern systems handle EOL conversions, but issues like Control M characters can occur. Windows' CR LF simplifies cross-system transfers, contrasting Unix's LF. RFCs mandate CR LF for internet transmission, despite Unix's LF storage. Binary FTP mode preserves EOL characters, and compressed RFC collections accommodate Unix and Windows EOL conventions, crucial for data exchange.

PostScript and Interpress: A Comparison (1985)

Brian Reid compares PostScript and Interpress, detailing their history, development, and similarities in controlling laser printers. Both languages evolved from earlier systems, with PostScript by Adobe and Interpress by Xerox. Despite differences, they significantly advanced page description languages.

Interactive Comparator of Different National Layouts on a Computer Keyboard

The page provides a keyboard layout comparator emphasizing alphanumeric blocks and character assignment variations. It includes Unicode code points, key names, and references to keyboard resources. Tools like TMK, QMK, and Soarer's Converter are listed. Last updated 17/05/2023. Contact Miguel Farah for inquiries.

The C Standard charter was updated, now with security principles as well

The ISO/IEC JTC1/SC22/WG14 committee oversees C Standard development, focusing on portability, efficiency, and stability. Collaboration with the C++ committee ensures compatibility. Principles guide feature integration, code efficiency, security, and adaptability.

12 comments

By @swatcoder - 10 months

The purpose is more obvious if you understand what makes \r and \n different, and why both existed. But I guess that's becoming lost knowledge now.

Ultimately, many "character sets" combined printable characters, cursor/head control, and record boundaries into one serialized byte stream that could be used by terminals, printers, programs for all sorts of purposes.

By @PaulHoule - 10 months

It was a common technique back in the day with both dot matrix and "letter quality" printers to print a line and then go back and print it again to either get a bold effect by printing the same characters twice or to overlay one character on top of another. If the spacing was right you could have drawn accented characters that way.

By @bhaak - 10 months

Neither on SE nor here I could find a mention of dead keys.

https://en.wikipedia.org/wiki/Dead_key

I can't claim I know how they relate exactly to the accented characters being encoded in character sets but they seem to be at least historically an influence. Pressing a dead key which doesn't advance the cursor and then overwrite the basic character over it is certainly faster than using backspace (also cheaper if you think about character pricing).

That the ECMA specs only talk about using BACKSPACE is surprising. At least those OS I used only supported the dead key approach but of course that was decades after the specs were written.

By @hsdropout - 10 months

I sometimes use these in Windows when I expand characters with FormD[0] as part of username validation.

If the expanded count doesn't match, a diacritic might be present.

[0] https://learn.microsoft.com/en-us/dotnet/api/system.text.nor...

By @surfingdino - 10 months

They exist so that typists can insert random characters that look similar to what they actually meant to type. This has a nice income-generating effect for developers who know how to handle incorrectly encoded data. I make good money fixing data processing pipelines written to expect utf-8 only to be given something else.

By @Theodores - 10 months

My younger self would have used the umlaut character for 'ditto'.

To some extent the character set was still evolving, for example the Euro sign was not around until decades later and that would need to be bolted together with backspace characters or escape codes, maybe even downloaded characters, with the printer specific manual (Epson) studied at great length.

In the DOS era (and before with home micros that were programmed in BASIC) it was quite normal to compose things for the printer that you had no expectation of seeing on screen, not that anyone read much on screen (as everyone had vast piles of paper on their desk).

Until quite recently some POS systems were very much tied in to a very specific printer, at least these character sets were a step forward from hard coding a BASIC program to an exact make and model of printer.

By @BugsJustFindMe - 10 months

> they're all pretty much useless on their own for anything besides ASCII art.

The asker completely ignores that asking questions about accent marks, like they themselves are doing in that very post, would be a lot more annoying without being able to write said accent marks.

By @jfim - 10 months

I'm surprised it wasn't mentioned, but they were also used for text entry in some text editing applications.

For example, one could type ë by entering ¨ then following with e. The ¨ would be displayed at the position where the combined character would be, while waiting for the second character to be entered. Once the second character is entered, the display would be updated with the correct combined character.

By @bhaak - 10 months

> Having an OS drawing characters on a bitmap display, a prerequisite to composing, is a very new development, way more recent than the character definitions leading to above encoding.

New? Some computers of the 1980s could already do this. At least the 16 bit home computers had bitmap drawn characters on the screen.

Edit: Looks like somebody doesn't believe that computers in the 80s had such a thing.

> On the Amiga, rendering text is similar to rendering lines and shapes. The Amiga graphics library provides text functions based around the RastPort structure, which makes it easy to intermix graphics and text.

> In order to render text, the Amiga needs to have a graphical representation for each symbol or text character. These individual images are known as glyphs. The Amiga gets each glyph from a font in the system font list. At present, the fonts in the system list contain a bitmap of a specific point size for all the characters and symbols of the font.

https://wiki.amigaos.net/wiki/Graphics_Library_and_Text

By @timonoko - 10 months

In ADM-3 (or some such) there was only one backspace+overstrike character and it was underscore. So Ä and Ö was marked thus.

Otherwise HYVÄÄ YÖTÄ was HYV{{ Y|T{, which was only little miserable.

But if you changed the ROM into Swedish ROM, {a|b} become äaöbå, which was basically unreadable.

By @cryptonector - 10 months

Yup, ASCII was a multi-byte character set, using overstrike with BS (backspace). Little-known fact, that. There's still a holdover of this in terminal apps, which use this for underscoring and bolding.

By @scelerat - 10 months

TLDR, you combine/precede naked accent characters with the backspace character on your output device (probably a printer) to get accented characters.

What actual purpose do accent characters in ISO-8859-1 and Windows 1252 serve?

Related

The End-of-Line Story (2004)

The End-of-Line Story (2004)

PostScript and Interpress: A Comparison (1985)

Interactive Comparator of Different National Layouts on a Computer Keyboard

The C Standard charter was updated, now with security principles as well

Related

The End-of-Line Story (2004)

The End-of-Line Story (2004)

PostScript and Interpress: A Comparison (1985)

Interactive Comparator of Different National Layouts on a Computer Keyboard

The C Standard charter was updated, now with security principles as well