July 22nd, 2024

The Elegance of the ASCII Table

The article explores the elegance and historical importance of the ASCII table in computing. It discusses design choices, historical context, compatibility with Unicode, practical applications, and enduring relevance in technology.

Read original article

HistoryNostalgiaCritique

The article discusses the elegance and historical significance of the ASCII table, a fundamental encoding system used in computing. It highlights the beauty and logic behind the design choices made in ASCII, such as the arrangement of control codes, the positioning of characters like space and numbers, and the pattern followed by uppercase and lowercase letters. The article delves into the historical context of ASCII's development, including its origins in the 1960s and its compatibility with modern Unicode standards. It also touches on the practical implications of ASCII, such as facilitating sorting and providing a basis for understanding binary representations. Overall, the article emphasizes the intricate design and functionality of ASCII, shedding light on its enduring relevance in contemporary technology.

What actual purpose do accent characters in ISO-8859-1 and Windows 1252 serve?

Accent characters in ISO-8859-1 and Windows 1252 ensure compatibility with older 7-bit character sets, adding national characters and composed characters. Their inclusion predates modern standards, impacting historical encoding variants. Application software determines their usage.

Apple II graphics: More than you wanted to know

The article explores Apple II graphics, emphasizing its historical importance and technical features like pixel-addressable graphics and sixteen colors. It contrasts with competitors and delves into synchronization challenges and hardware details.

Beyond monospace: the search for the perfect coding font

Designing coding fonts involves more than monospacing. Key considerations include hyphens resembling minus signs, aligning symbols, distinguishing zero from O, and ensuring clarity for developers and type designers. Testing with proofing strings is recommended.

Weekend projects: getting silly with C

The C programming language's simplicity and expressiveness, despite quirks, influence other languages. Unconventional code structures showcase creativity and flexibility, promoting unique coding practices. Subscription for related content is encouraged.

The absolute minimum you must know about Unicode and encodings

Joel Spolsky discusses the significance of Unicode for developers, debunking misconceptions and tracing its evolution from ASCII to accommodating diverse characters globally, urging developers to understand character encoding fundamentals.

AI: What people are saying

The comments on the article about the ASCII table cover various aspects of its design, history, and practical use.

Several users discuss the practical applications and tools for viewing the ASCII table, such as using the `man ascii` command on Linux.
There is a debate on the elegance and limitations of ASCII compared to other encoding systems like EBCDIC and Unicode.
Some comments highlight the historical context and design choices behind ASCII, including its relationship with typewriters and control characters.
Users share personal anecdotes and resources related to ASCII, including links to historical documents and websites.
There are mentions of the visual representation and educational aspects of the ASCII table, with suggestions for better illustrating its structure.

39 comments

By @augusto-moura - 9 months

Useful tip, on linux (not sure about other *nixes) you can view the ascii table by opening its manpage:

  man ascii

It's been useful to me more than once every year, mostly to know about shell escape codes and when doing weird character ranges in regex and C.

It can be a bit confusing, but the gist is that you have 2 chars being show in each line, I would prefer a view where you see the same char with shift and/or ctrl flags, but you can only ask so much

By @userbinator - 9 months

You might be familiar with carriage return (0D) and line feed (10)

You mean 0D and 0A, or 13 and 10, but that mix of base really stood out to me in an otherwise good article. I'm one of numerous others who have memorised most of the base ASCII table, and quite a few of the symbols as well as extended ASCII (CP437), mainly because it comes in handy for reading programs without needing a disassembler. Those who do a lot of web development may find the sequence 3A 2F 2F familiar too, as well as 3Ds and 3Fs.

I can see the rationale for <=> being in that order, but [\] and {|} are less obvious, as well as why their position is 1 column to the left of <=>.

By @dwheeler - 9 months

The encodings we use today have a surprisingly deep and complex history. For more, see: "The Evolution of Character Codes, 1874-1968" https://ia800606.us.archive.org/17/items/enf-ascii/ascii.pdf

By @EvanAnderson - 9 months

I would be remiss not to post a link to the late Bob Bemer's[0] website.

https://web.archive.org/web/20150801005415/http://bobbemer.c...

He was considered the "father of ASCII". Hr wrote very well and gives clear explanations for the motivations behind the design of ASCII.

[0] https://en.m.wikipedia.org/wiki/Bob_Bemer

By @lucasoshiro - 9 months

Once I saw a case-insensitive switch in C using that pattern of letters:

switch (my_char | 0x20) {

   case 'a': ...
   break;

   case 'b': ...
   break;

}

By @WalterBright - 9 months

Too bad we now have Unicode, an elegant castle covered with ugly graffiti and ramshackle addons. For example:

1. normalization

2. backwards running text (hey, why not add spiral running text?)

3. fonts

4. invisible characters

5. multiple code points with the same glyph

6. glyphs defined by multiple code points (gee, I thought Unicode was to get away with that mess from code pages!)

7. made up languages (Elvish? Come on!)

8. you vote for my made-up emoticon, and I'll vote for yours!

By @thristian - 9 months

> That, I’m afraid, is because ASCII was based not on modern computer keyboards but on the shifted positions of a Remington No. 2 mechanical typewriter – whose shifted layout was the closest compromise we could find as a standard at the time, I imagine.

According to Wikipedia¹, American typewriters were pretty consistent with keyboard layout until the IBM Selectric electric typewriter. Apparently "small" characters (like apostrophe, double-quote, underscore, and hyphen) should be typed with less pressure to avoid damaging the platen, and IBM decided the Selectric could be simpler if those symbols were grouped on dedicated keys instead of sharing keys with "high pressure" symbols, so they shuffled the symbols around a bit, resulting in a layout that would look very familiar to a modern PC user.

Because IBM electric typewriters were so widely used (at least in English speaking countries), any computer company that wanted to sell to businesses wanted a Selectric-style layout, including the IBM PC.

Meanwhile, in other countries where typewriters in general weren't so popular or useful, the earliest computers had ASCII-style punctuation layout for simplicity, and later computers didn't have any pressing need to change, so they stuck with it. Japanese keyboards, for example, are still ASCII-style to this day.

¹: https://en.wikipedia.org/wiki/IBM_Selectric#Keyboard_layout

By @jolmg - 9 months

> So when you’re reading 7-bit ASCII, if it starts with 00, it’s a non-printing character. Otherwise it’s a printing character.

> The first printing character is space; it’s an invisible character, but it’s still one that has meaning to humans, so it’s not a control character (this sounds obvious today, but it was actually the source of some semantic argument when the ASCII standard was first being discussed).

Hmm.. Interesting that space is considered a printing character while horizontal tab and newline are control characters. They're all invisible and move the cursor, but I guess it makes sense. Space is uniquely very specific in how the cursor is moved one character space, so it's like an invisible character. Newline can either imply movement straight down, or down and to the left, depending on a configuration or platform (e.g. DOS vs UNIX line endings). Horizontal tab can also move you a configurable amount rightwards, and perhaps it might've been thought a bit differently, given there's also a vertical tab, which I've got no idea on how it was used. Maybe it's the newline-equivalent for tables, e.g. "id\tcolor\v1\tred\v2\tblue\v" or something like that.

Interesting also that BS is a control char while DEL is a printing(?) char. I guess that's because BS implies just movement leftwards over the text, while DEL is all ones like running a black sharpie through text. Guess that's what makes it printing. Wonder if there were DEL keys on typewriters that just stamped a black square, and on keypunchers that just punched 7 holes, so people would press "backspace" to go back then "delete" to overwrite.

I've used ASCII a lot, but even after so many years, I'm getting moments where it's like "oh this piece isn't just here, it needs to be here for a deep reason". It's like a jigsaw puzzle.

By @BobbyTables2 - 9 months

I always lament that since at least 1980s or so, it seems the vast majority of the control characters were never used for their intended purpose.

Instead, we crudely use commas and tabs as delimiters instead of something like RS (#30).

By @Dwedit - 9 months

Many old NES/SNES games had a simpler character encoding system, with 0-9 and A-Z at the beginning of the table. No conversion require to display hex.

By @bloak - 9 months

Vaguely related: Apart from £ and €, a typical GB keyboard has a couple of non-ASCII characters printed on it: ¬ and ¦. The key labelled ¦ is usually mapped to |, but the key labelled ¬ often gives you an actual ¬, though I can't remember many occasions on which I've wanted one of them. Apparently the characters ¬ and ¦ are in EBCDIC.

By @ggm - 9 months

  man ascii

is never far from my fingers. combined with od -c and od -x it gets the job done. I don't think as fluently in Octal as I used to. Hex has become ubiquitous.

By @gumby - 9 months

I wish the author had included the full ascii chart in 4 bits across / 4 bits down. You can mask a single bit to change case and that is super obvious that way.

The charts that simply show you the assignments in hex and octal obscure the elegance of the design.

By @senkora - 9 months

Fun fact: sorting ASCII numerically puts all the uppercase letters first, followed by all the lowercase letters (ABC... abc...). A more typical dictionary ordering would be more like AaBbCc... (or to even consider A and a at the same sort level and only use them to break ties if the words are otherwise identical).

The order used by ASCII is sometimes called "ASCIIbetical", which I think is wonderful.

https://en.wiktionary.org/wiki/ASCIIbetical

By @red_admiral - 9 months

The "16 rows x 8 columns" version, with the lowercase letters added, seems the most elegant one to me because it makes the internal structure of the thing visible. For example, to lowercase a letter, you set bit 6; a decimal digit is the prefix 011 followed by the binary encoding of the digit etc.

It also makes clear why ESC can be entered as `^[` or ENTER (technically CR) as `^M` on some terminals (still works in my xterm), because the effect of the control key is to unset bits 6 and 7 in the original set-up.

Of course you can color in the fields too, if you want.

By @transfire - 9 months

One downside of ASCII is the lack of two extra “letters” (whatever they might be, e.g. perhaps German ß), as it makes it impossible to represent base 64 alphanumerically. So we ended up with many alternatives picking two arbitrary punctuation marks.

By @georgehotelling - 9 months

Dark grey #303030 text on slightly darker grey #1B1C21 background is really hard to read. Maybe I'm just getting old, but I also assume the audience for a blog post about the ASCII table was born in a year that starts with 19.

By @KingOfCoders - 9 months

For everyone who doesn't need ä,ü,ö. Or software that needs to take ä,ü,ö. For everyone else, UTF is a blessing.

By @PaulHoule - 9 months

Beats EBCDIC

https://en.wikipedia.org/wiki/EBCDIC

On the 4th floor of my building the computer systems lab has a glass front that has what looks like a punch card etched in frosted glass but if you look closer it was made by sticking stickers on the glass.

I made a "punchcard decoder" on a 4x6 card to help people decode the message on the wall

https://mastodon.social/@UP8/112836035703067309

The EBCDIC code was designed to be compatible with this encoding which has all sorts of weird features, for instance the "/" right between "R" and "Z"; letters don't form a consecutive block so testing to see if a char is a letter is more complex than in ASCII.

I am thinking of redoing that card to put the alphabet in order. A column in a punched card has between 0 to 3 punches, 0 is a space, 1 is a letter or a symbol in the first column, if one of the rows at the top is punched you combine that with the number of the other punched row on the left 3x9 grid. If three holes are punched one of them is an 8 (unless you've got one of the extended charsets) and you have one of the symbols in the right 3x6. Note the ¬ and ¢ which are not in ASCII but are in latin-1.

By @zokier - 9 months

I think that adopting ASCII as the general purpose text encoding was one of the great mistakes of early computing. It originated as control interface for teletypes and such, and that's arguably where it should have remained. For storing and processing (plain) text ASCII doesn't really fit that well, control characters are a hindrance and the code space would have been useful for additional characters. The ASCII set of printables was definitely a compromise formed by the limited code space.

By @DonHopkins - 9 months

The Apple ][ and TTYs and other old computers had "bit pairing keyboards", where the punctuation marks above the digits were aligned with the ASCII values of the corresponding digits, different by one bit.

    Typewriter: !@#$%^&*()
    Apple:      !"#$%&'()
    Digits:     1234567890

https://en.wikipedia.org/wiki/Bit-paired_keyboard

>A bit-paired keyboard is a keyboard where the layout of shifted keys corresponds to columns in the ASCII (1963) table, archetypally the Teletype Model 33 (1963) keyboard. This was later contrasted with a typewriter-paired keyboard, where the layout of shifted keys corresponds to electric typewriter layouts, notably the IBM Selectric (1961). The difference is most visible in the digits row (top row): compared with mechanical typewriters, bit-paired keyboards remove the _ character from 6 and shift the remaining &() from 7890 to 6789, while typewriter-paired keyboards replace 3 characters: ⇧ Shift+2 from " to @ ⇧ Shift+6 from _ to ^ and ⇧ Shift+8 from ' to . An important subtlety is that ASCII was based on mechanical typewriters, but electric typewriters became popular during the same period that ASCII was adopted, and made their own changes to layout.[1] Thus differences between bit-paired and (electric) typewriter-paired keyboards are due to the differences of both of these from earlier mechanical typewriters.

>[...] Bit-paired keyboard layouts survive today only in the standard Japanese keyboard layout, which has all shifted values of digits in the bit-paired layout.

>[...] For this reason, among others (such as ease of collation), the ASCII standard strove to organize the code points so that shifting could be implemented by simply toggling a bit. This is most conspicuous in uppercase and lowercase characters: uppercase characters are in columns 4 (100) and 5 (101), while the corresponding lowercase characters are in columns 6 (110) and 7 (111), requiring only toggling the 6th bit (2nd high bit) to switch case; as there are only 26 letters, the remaining 6 points in each column were occupied by symbols or, in one case, a control character (DEL, in 127).

>[...] In the US, bit-paired keyboards continued to be used into the 1970s, including on electronic keyboards like the HP 2640 terminal (1975) and the first model Apple II computer (1977).

By @blahedo - 9 months

Another piece of elegance: by putting the uppercase letters in the block beginning at 0x40 (well, 0x41) it means that all the control codes at the start of the table line up with a letter (or one of a small set of other punctuation: @[\]^_), giving both a natural shorthand visual representation and a way to enter them with an early keyboard, by joining the pressing of the letter with... the Control key. Control-M (often written ^M) is carriage return because carriage return is 0x0D and M is 0x4D.

By @eviks - 9 months

> The first 32 “characters” (and, arguably, the final one) aren’t things that you can see, but commands sent between machines to provide additional instructions

Such a waste and no extensibility kills and claim to elegance of some shifted binary numbers, that'd the wrong end to focus your optimization efforts on

By @kragen - 9 months

unfortunately this page is based on mackenzie's book. mackenzie is the ibm guy who spent decades trying to kill ascii, promoting its brain-damaged ebcdic as a superior replacement (because it was more compatible, at least if you were already an ibm customer). he spends most of his fucking book trumpeting the virtues of ebcdic actually

bob bemer more or less invented ascii. he was also an ibm guy before mackenzie's crowd pushed him out of ibm for promoting it. he wrote a much better book about the history of ascii which is also freely available online, really more a pamphlet than a book, called "a story of ascii": https://archive.org/details/ascii-bemer/page/n1/mode/2up

tom jennings, who invented fido, also wrote a history of ascii, called 'an annotated history of some character codes or ascii: american standard code for information infiltration'; it's no longer online at his own site, but for the time being the archive has preserved it: https://web.archive.org/web/20100414012008/http://wps.com/pr...

jennings's history is animated by a palpable rage at mackenzie's self-serving account of the history of ascii, partly because bemer hadn't really told his own story publicly. so jennings goes so far as to write punchcard codes (and mackenzie) out of ascii's history entirely, deriving it purely from teletypewriter codes—from which it does undeniably draw many features, but after all, bemer was a punchcard guy, and ascii's many excellent virtues for collation show it

as dwheeler points out, the accomplished informatics archivist eric fischer has also written an excellent history of the evolution of ascii. though, unlike bemer, fischer wasn't actually at the standardization meetings that created ascii, he is more careful and digs deeper than either bemer or jennings, so it might be better to read him first: https://archive.org/details/enf-ascii/

it would be a mistake to credit ascii entirely to bemer; aside from the relatively minor changes in 01967 (including making lowercase official), the draft was extensively revised by the standards committees in the years leading up to 01963, including dramatic improvements in the control-character set

for the historical relationship between ascii character codes and keyboard layouts, see https://en.wikipedia.org/wiki/Bit-paired_keyboard

By @wduquette - 9 months

Regarding paper tape, our first home computer (this was in the mid-to-late 70's) had a paper tape reader and punch. I do not miss paper tape as a storage medium, but I have to say the little punched-out dots were fun to use as confetti at high school football games.

By @snvzz - 9 months

The ASCII table is defective; it is missing a dedicated code for newline.

CR and LF aren't dedicated, and have precise cursor movement meanings, rather than being a logical line ender.

There was a proposal in the 80s to reassigning the -otherwise useless- VT (vertical tab) character for the purpose. Unfortunately unfruitful.

By @1vuio0pswjnm7 - 9 months

Mentioned in footnote 7:

https://ia601808.us.archive.org/2/items/mackenzie-coded-char...

By @th0ma5 - 9 months

I heard someone describe the ASCII table as a state machine. Guess I could understand that as a state machine needed to parse it? This is surprisingly hard to search for but I was wondering if anyone knows what they were talking about.

By @yawl - 9 months

I also wrote a chat novel about ASCII: https://www.lostlanguageofthemachines.com/chapter2/chat

By @pixelbeat__ - 9 months

I wrote about ASCII and UTF-8 elegance at:

https://www.pixelbeat.org/docs/utf8_programming.html

By @renox - 9 months

I still think that they made a big mistake in not having the letters immediately following the numbers, this would have made printing numbers in hexadecimal much more efficient.

By @aronhegedus - 9 months

Was a really fun article to read/podcast to listen to.

Favorite fact is that 127 is the DEL because for hole punching it removes all the info. I love those little nuggets of history

By @johanneskanybal - 9 months

Kind of hard to read something where the author considers every non-english languages equally worthy to emoji’s.. It was good in the 50’s but was important like 4-5 decades too long.

By @netcraft - 9 months

I've searched off and on for a great stylistic representation of the ASCII table, id love a poster to hang on my wall, or possibly even something I could get as a tattoo.

By @niobe - 9 months

I mean, this elegant design is just a necessity of efficient processing and is found in many places, it's throughout digital communications protocols for example. Look at IP addressing. In that sense all early computing is elegant since it had to be on the limited resources of the times.

By @jiveturkey - 9 months

ebcdic is also quite elegant

https://news.ycombinator.com/item?id=13543715

The Elegance of the ASCII Table

Related

What actual purpose do accent characters in ISO-8859-1 and Windows 1252 serve?

Apple II graphics: More than you wanted to know

Beyond monospace: the search for the perfect coding font

Weekend projects: getting silly with C

The absolute minimum you must know about Unicode and encodings

Related

What actual purpose do accent characters in ISO-8859-1 and Windows 1252 serve?

Apple II graphics: More than you wanted to know

Beyond monospace: the search for the perfect coding font

Weekend projects: getting silly with C

The absolute minimum you must know about Unicode and encodings