The Elegance of the ASCII Table
The article explores the elegance and historical importance of the ASCII table in computing. It discusses design choices, historical context, compatibility with Unicode, practical applications, and enduring relevance in technology.
Read original articleThe article discusses the elegance and historical significance of the ASCII table, a fundamental encoding system used in computing. It highlights the beauty and logic behind the design choices made in ASCII, such as the arrangement of control codes, the positioning of characters like space and numbers, and the pattern followed by uppercase and lowercase letters. The article delves into the historical context of ASCII's development, including its origins in the 1960s and its compatibility with modern Unicode standards. It also touches on the practical implications of ASCII, such as facilitating sorting and providing a basis for understanding binary representations. Overall, the article emphasizes the intricate design and functionality of ASCII, shedding light on its enduring relevance in contemporary technology.
Related
What actual purpose do accent characters in ISO-8859-1 and Windows 1252 serve?
Accent characters in ISO-8859-1 and Windows 1252 ensure compatibility with older 7-bit character sets, adding national characters and composed characters. Their inclusion predates modern standards, impacting historical encoding variants. Application software determines their usage.
Apple II graphics: More than you wanted to know
The article explores Apple II graphics, emphasizing its historical importance and technical features like pixel-addressable graphics and sixteen colors. It contrasts with competitors and delves into synchronization challenges and hardware details.
Beyond monospace: the search for the perfect coding font
Designing coding fonts involves more than monospacing. Key considerations include hyphens resembling minus signs, aligning symbols, distinguishing zero from O, and ensuring clarity for developers and type designers. Testing with proofing strings is recommended.
Weekend projects: getting silly with C
The C programming language's simplicity and expressiveness, despite quirks, influence other languages. Unconventional code structures showcase creativity and flexibility, promoting unique coding practices. Subscription for related content is encouraged.
The absolute minimum you must know about Unicode and encodings
Joel Spolsky discusses the significance of Unicode for developers, debunking misconceptions and tracing its evolution from ASCII to accommodating diverse characters globally, urging developers to understand character encoding fundamentals.
- Several users discuss the practical applications and tools for viewing the ASCII table, such as using the `man ascii` command on Linux.
- There is a debate on the elegance and limitations of ASCII compared to other encoding systems like EBCDIC and Unicode.
- Some comments highlight the historical context and design choices behind ASCII, including its relationship with typewriters and control characters.
- Users share personal anecdotes and resources related to ASCII, including links to historical documents and websites.
- There are mentions of the visual representation and educational aspects of the ASCII table, with suggestions for better illustrating its structure.
man ascii
It's been useful to me more than once every year, mostly to know about shell escape codes and when doing weird character ranges in regex and C.It can be a bit confusing, but the gist is that you have 2 chars being show in each line, I would prefer a view where you see the same char with shift and/or ctrl flags, but you can only ask so much
You mean 0D and 0A, or 13 and 10, but that mix of base really stood out to me in an otherwise good article. I'm one of numerous others who have memorised most of the base ASCII table, and quite a few of the symbols as well as extended ASCII (CP437), mainly because it comes in handy for reading programs without needing a disassembler. Those who do a lot of web development may find the sequence 3A 2F 2F familiar too, as well as 3Ds and 3Fs.
I can see the rationale for <=> being in that order, but [\] and {|} are less obvious, as well as why their position is 1 column to the left of <=>.
https://web.archive.org/web/20150801005415/http://bobbemer.c...
He was considered the "father of ASCII". Hr wrote very well and gives clear explanations for the motivations behind the design of ASCII.
switch (my_char | 0x20) {
case 'a': ...
break;
case 'b': ...
break;
}
1. normalization
2. backwards running text (hey, why not add spiral running text?)
3. fonts
4. invisible characters
5. multiple code points with the same glyph
6. glyphs defined by multiple code points (gee, I thought Unicode was to get away with that mess from code pages!)
7. made up languages (Elvish? Come on!)
8. you vote for my made-up emoticon, and I'll vote for yours!
According to Wikipedia¹, American typewriters were pretty consistent with keyboard layout until the IBM Selectric electric typewriter. Apparently "small" characters (like apostrophe, double-quote, underscore, and hyphen) should be typed with less pressure to avoid damaging the platen, and IBM decided the Selectric could be simpler if those symbols were grouped on dedicated keys instead of sharing keys with "high pressure" symbols, so they shuffled the symbols around a bit, resulting in a layout that would look very familiar to a modern PC user.
Because IBM electric typewriters were so widely used (at least in English speaking countries), any computer company that wanted to sell to businesses wanted a Selectric-style layout, including the IBM PC.
Meanwhile, in other countries where typewriters in general weren't so popular or useful, the earliest computers had ASCII-style punctuation layout for simplicity, and later computers didn't have any pressing need to change, so they stuck with it. Japanese keyboards, for example, are still ASCII-style to this day.
¹: https://en.wikipedia.org/wiki/IBM_Selectric#Keyboard_layout
> The first printing character is space; it’s an invisible character, but it’s still one that has meaning to humans, so it’s not a control character (this sounds obvious today, but it was actually the source of some semantic argument when the ASCII standard was first being discussed).
Hmm.. Interesting that space is considered a printing character while horizontal tab and newline are control characters. They're all invisible and move the cursor, but I guess it makes sense. Space is uniquely very specific in how the cursor is moved one character space, so it's like an invisible character. Newline can either imply movement straight down, or down and to the left, depending on a configuration or platform (e.g. DOS vs UNIX line endings). Horizontal tab can also move you a configurable amount rightwards, and perhaps it might've been thought a bit differently, given there's also a vertical tab, which I've got no idea on how it was used. Maybe it's the newline-equivalent for tables, e.g. "id\tcolor\v1\tred\v2\tblue\v" or something like that.
Interesting also that BS is a control char while DEL is a printing(?) char. I guess that's because BS implies just movement leftwards over the text, while DEL is all ones like running a black sharpie through text. Guess that's what makes it printing. Wonder if there were DEL keys on typewriters that just stamped a black square, and on keypunchers that just punched 7 holes, so people would press "backspace" to go back then "delete" to overwrite.
I've used ASCII a lot, but even after so many years, I'm getting moments where it's like "oh this piece isn't just here, it needs to be here for a deep reason". It's like a jigsaw puzzle.
Instead, we crudely use commas and tabs as delimiters instead of something like RS (#30).
man ascii
is never far from my fingers. combined with od -c and od -x it gets the job done. I don't think as fluently in Octal as I used to. Hex has become ubiquitous.The charts that simply show you the assignments in hex and octal obscure the elegance of the design.
The order used by ASCII is sometimes called "ASCIIbetical", which I think is wonderful.
It also makes clear why ESC can be entered as `^[` or ENTER (technically CR) as `^M` on some terminals (still works in my xterm), because the effect of the control key is to unset bits 6 and 7 in the original set-up.
Of course you can color in the fields too, if you want.
https://en.wikipedia.org/wiki/EBCDIC
On the 4th floor of my building the computer systems lab has a glass front that has what looks like a punch card etched in frosted glass but if you look closer it was made by sticking stickers on the glass.
I made a "punchcard decoder" on a 4x6 card to help people decode the message on the wall
https://mastodon.social/@UP8/112836035703067309
The EBCDIC code was designed to be compatible with this encoding which has all sorts of weird features, for instance the "/" right between "R" and "Z"; letters don't form a consecutive block so testing to see if a char is a letter is more complex than in ASCII.
I am thinking of redoing that card to put the alphabet in order. A column in a punched card has between 0 to 3 punches, 0 is a space, 1 is a letter or a symbol in the first column, if one of the rows at the top is punched you combine that with the number of the other punched row on the left 3x9 grid. If three holes are punched one of them is an 8 (unless you've got one of the extended charsets) and you have one of the symbols in the right 3x6. Note the ¬ and ¢ which are not in ASCII but are in latin-1.
Typewriter: !@#$%^&*()
Apple: !"#$%&'()
Digits: 1234567890
https://en.wikipedia.org/wiki/Bit-paired_keyboard>A bit-paired keyboard is a keyboard where the layout of shifted keys corresponds to columns in the ASCII (1963) table, archetypally the Teletype Model 33 (1963) keyboard. This was later contrasted with a typewriter-paired keyboard, where the layout of shifted keys corresponds to electric typewriter layouts, notably the IBM Selectric (1961). The difference is most visible in the digits row (top row): compared with mechanical typewriters, bit-paired keyboards remove the _ character from 6 and shift the remaining &() from 7890 to 6789, while typewriter-paired keyboards replace 3 characters: ⇧ Shift+2 from " to @ ⇧ Shift+6 from _ to ^ and ⇧ Shift+8 from ' to . An important subtlety is that ASCII was based on mechanical typewriters, but electric typewriters became popular during the same period that ASCII was adopted, and made their own changes to layout.[1] Thus differences between bit-paired and (electric) typewriter-paired keyboards are due to the differences of both of these from earlier mechanical typewriters.
>[...] Bit-paired keyboard layouts survive today only in the standard Japanese keyboard layout, which has all shifted values of digits in the bit-paired layout.
>[...] For this reason, among others (such as ease of collation), the ASCII standard strove to organize the code points so that shifting could be implemented by simply toggling a bit. This is most conspicuous in uppercase and lowercase characters: uppercase characters are in columns 4 (100) and 5 (101), while the corresponding lowercase characters are in columns 6 (110) and 7 (111), requiring only toggling the 6th bit (2nd high bit) to switch case; as there are only 26 letters, the remaining 6 points in each column were occupied by symbols or, in one case, a control character (DEL, in 127).
>[...] In the US, bit-paired keyboards continued to be used into the 1970s, including on electronic keyboards like the HP 2640 terminal (1975) and the first model Apple II computer (1977).
Such a waste and no extensibility kills and claim to elegance of some shifted binary numbers, that'd the wrong end to focus your optimization efforts on
bob bemer more or less invented ascii. he was also an ibm guy before mackenzie's crowd pushed him out of ibm for promoting it. he wrote a much better book about the history of ascii which is also freely available online, really more a pamphlet than a book, called "a story of ascii": https://archive.org/details/ascii-bemer/page/n1/mode/2up
tom jennings, who invented fido, also wrote a history of ascii, called 'an annotated history of some character codes or ascii: american standard code for information infiltration'; it's no longer online at his own site, but for the time being the archive has preserved it: https://web.archive.org/web/20100414012008/http://wps.com/pr...
jennings's history is animated by a palpable rage at mackenzie's self-serving account of the history of ascii, partly because bemer hadn't really told his own story publicly. so jennings goes so far as to write punchcard codes (and mackenzie) out of ascii's history entirely, deriving it purely from teletypewriter codes—from which it does undeniably draw many features, but after all, bemer was a punchcard guy, and ascii's many excellent virtues for collation show it
as dwheeler points out, the accomplished informatics archivist eric fischer has also written an excellent history of the evolution of ascii. though, unlike bemer, fischer wasn't actually at the standardization meetings that created ascii, he is more careful and digs deeper than either bemer or jennings, so it might be better to read him first: https://archive.org/details/enf-ascii/
it would be a mistake to credit ascii entirely to bemer; aside from the relatively minor changes in 01967 (including making lowercase official), the draft was extensively revised by the standards committees in the years leading up to 01963, including dramatic improvements in the control-character set
for the historical relationship between ascii character codes and keyboard layouts, see https://en.wikipedia.org/wiki/Bit-paired_keyboard
CR and LF aren't dedicated, and have precise cursor movement meanings, rather than being a logical line ender.
There was a proposal in the 80s to reassigning the -otherwise useless- VT (vertical tab) character for the purpose. Unfortunately unfruitful.
https://ia601808.us.archive.org/2/items/mackenzie-coded-char...
Favorite fact is that 127 is the DEL because for hole punching it removes all the info. I love those little nuggets of history
Related
What actual purpose do accent characters in ISO-8859-1 and Windows 1252 serve?
Accent characters in ISO-8859-1 and Windows 1252 ensure compatibility with older 7-bit character sets, adding national characters and composed characters. Their inclusion predates modern standards, impacting historical encoding variants. Application software determines their usage.
Apple II graphics: More than you wanted to know
The article explores Apple II graphics, emphasizing its historical importance and technical features like pixel-addressable graphics and sixteen colors. It contrasts with competitors and delves into synchronization challenges and hardware details.
Beyond monospace: the search for the perfect coding font
Designing coding fonts involves more than monospacing. Key considerations include hyphens resembling minus signs, aligning symbols, distinguishing zero from O, and ensuring clarity for developers and type designers. Testing with proofing strings is recommended.
Weekend projects: getting silly with C
The C programming language's simplicity and expressiveness, despite quirks, influence other languages. Unconventional code structures showcase creativity and flexibility, promoting unique coding practices. Subscription for related content is encouraged.
The absolute minimum you must know about Unicode and encodings
Joel Spolsky discusses the significance of Unicode for developers, debunking misconceptions and tracing its evolution from ASCII to accommodating diverse characters globally, urging developers to understand character encoding fundamentals.