https://github.com/jez/bin/blob/master/ascii-4col.txt
It's neat because it's the only command I have that uses `tail` for the shebang line.
Imagine a Unicode like this:
8:8:16
- 8 bits of flags. - 8 bit script family code: 0 for BMP. - 16 bit plane for every script code and flag combination.
The flags could do usefuil things like indicate character display width, case, and other attributes (specific to a script code).
Unicode peaked too early and applied an economy of encoding which rings false now in an age in which consumer devices have two digit gigabyte memories, multi terabyte of storage, and high definition video is streamed over the internet.
https://blog.glyphdrawing.club/the-origins-of-del-0x7f-and-i...
It really helps understand the logic of ASCII.
Four Column ASCII (2017) - https://news.ycombinator.com/item?id=21073463 - Sept 2019 (40 comments)
Four Column ASCII - https://news.ycombinator.com/item?id=13539552 - Feb 2017 (68 comments)
Also explains why there is no difference between Ctrl-x and Ctrl-Shift-x.
Though the 01 column is a bit unsatisfying because it doesn’t seem to have any connection to its siblings.
https://dl.acm.org/doi/epdf/10.1145/365628.365652
also defined 6-bit ASCII subset
I once got drunk with my elderly unix supernerd friend and he was talking about TTYs and how his passwords contained embedded ^S and ^Q characters and he traced the login process to learn they were just stalling the tty not actually used to construct the hash. No one else at the bar got the drift. He patched his system to put do 'raw' instead of 'cooked' mode for login passwords. He also used backspaces ^? ^H as part of his passwords. He was a real security tiger. I miss him.
The idea that SOH/1 is "Ctrl-A" or ESC/27 is "Ctrl-[" is not part of ASCII; that idea comes from they way terminals provided access to the control characters, by a Ctrl key that just masked out a few bits.
It makes sense, but it didn’t really hit me until recently. Now, I’m wondering what other hidden cleverness is there that used to be common knowledge, but is now lost in the abstractions.
for x in range(0x0,0x20): print(chr(x),end=" ")
(I assume everybody knows that on mechanical typewriters and teletypes the "shift" key physically shifted the caret position upwards, so that a different glyph would be printed when hit by a typebar.)
I believe the layout of the shifted symbols on the numeric row were based on an early IBM Selectric typewriter for the US market. Then IBM went and changed it, and the latter is the origin of the ANSI keyboard layout we have now.
ESC [ { 11011
FS \ | 11100
GS ] } 11101
Also curious why the keys open and close braces, but ... the single and double curly quotes don't open and close, but are stacked. Seems nuts every time I type Option-{ and Option-Shift-{ …- https://en.wikipedia.org/wiki/ASCII#History
- https://en.wikipedia.org/wiki/Hexadecimal#Cultural_history
(I'm almost reluctant to to spoil the fun for the kids these days, but https://en.wikipedia.org/wiki/%C2%A3sd )
EDIT: it would need to predate the 6-bit teletype codes that preceded ASCII.
ASCII did us all the favor of hitting a good stopping point and leaving the “infinity” solution to the future.
Even smaller 5-bit Baudot code had already had special characters to shift between two sets and discard the previous character. Murray code, used for typewriter-based devices, introduced CR and LF, so they were quite frequently needed in way more than few years.
I don't fault the creators of ASCII - those control characters were probably needed at the time. The fault is ours for not moving on from the legacy technology. I think some non-ASCII/Unicode encodings did reuse the control character bytes. Why didn't Unicode implement that? I assume they were trying to be be compatible with some existing encodings, but couldn't they have chosen the encodings that made use of the control character code points?
If Unicode were to change it now (probably not happening, but imagine ...), what would they do with those 32 code points? We couldn't move other common characters over to them - those already have well-known, heavily used code points in Unicode and also iirc Unicode promises backward compability with prior versions.
There still are scripts and glyphs not in Unicode, but those are mostly quite rare and effectively would continue to waste the space. Is there some set of characters that would be used and be a good fit? Duplicate the most commonly used codepoints above 8 bits, as a form of compression? Duplicate combining characters? Have a contest? Make it a private area - I imagine we could do that anyway, because I doubt most systems interpret those bytes now.
Also, how much old data, which legitimately uses the ASCII control characters, would become unreadable?
Look at the Teletype ASR-33, introduced in 1963.
for x in range(0x0,0x20): print(f'({chr(x)})', end =' ')
(0|) (1|) (2|) (3|) (4|) (5|) (6|) (7|) (8) (9| ) (10|
) (11|
) (12|
) (14|) (15|) (16|) (17|) (18|) (19|) (20|) (21|) (22|) (23|) (24|) (25|) (26|␦) (27|8|) (29|) (30|) (31|)Note on your Mac that the Option-{ and Option-}, with and without Shift, produce quotes which are all distinct from the characters produced by your '/" key! They are Unicode characters not in ASCII.
In the ASCII standard (1977 version here: https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub1-2-197...) the example table shows a glyph for the double quote which is vertical: it is neither an opening nor closing quote.
The apostrophe is shown as a closing quote, by slanting to the right; approximately a mirror image of the backtick. So it looks as though those two are intended to form an opening and closing pair. Except, in many terminal fonts, the apostrophe is a just vertical tick, like half of a double quote.
The ' being veritcal helps programming language '...' literals not look weird.
There's also these:
| ASCII | US keyboard |
|------------+-------------|
| 041/0x21 ! | 1 ! |
| 042/0x22 " | 2 @ |
| 043/0x23 # | 3 # |
| 044/0x24 $ | 4 $ |
| 045/0x25 % | 5 % |
| | 6 ^ |
| 046/0x26 & | 7 & |https://www.farah.cl/Keyboardery/A-Visual-Comparison-of-Diff...
I found this gem on Hacker News the other day. User soneil posted to a four column version of the ASCII table that blew my mind. I just wanted to repost this here so it is easier to discover.
Here's an excerpt from the comment:
I always thought it was a shame the ascii table is rarely shown in columns (or rows) of 32, as it makes a lot of this quite obvious. eg, http://pastebin.com/cdaga5i1 It becomes immediately obvious why, eg, ^[ becomes escape. Or that the alphabet is just 40h + the ordinal position of the letter (or 60h for lower-case). Or that we shift between upper & lower-case with a single bit.
You know in ASCII there are 32 characters at the beginning of the table that don't represent a written symbol. Backspace, newline, escape - that sort of thing. These are called control characters.
In the terminal you can type these control characters by holding the CTRL (control characters, get it?) key in combination with another key. For example, as many experienced vim users know pressing CTRL+[ in the terminal (which is ^[ in caret notation) is the same as pressing the ESC key. But why is the escape key triggered by the [ character? Why not another character? This is the insight soneil shares with us.
Remember that ASCII is a 7 bit encoding. Let's say the following:
In the linked table, which I reproduce below, the four groups are represented by the columns and the rows represent the values.
| 00 | 01 | 10 | 11 | |
|---|---|---|---|---|
| NUL | Spc | @ | ` | 00000 |
| SOH | ! | A | a | 00001 |
| STX | " | B | b | 00010 |
| ETX | # | C | c | 00011 |
| EOT | $ | D | d | 00100 |
| ENQ | % | E | e | 00101 |
| ACK | & | F | f | 00110 |
| BEL | ' | G | g | 00111 |
| BS | ( | H | h | 01000 |
| TAB | ) | I | i | 01001 |
| LF | * | J | j | 01010 |
| VT | + | K | k | 01011 |
| FF | , | L | l | 01100 |
| CR | - | M | m | 01101 |
| SO | . | N | n | 01110 |
| SI | / | O | o | 01111 |
| DLE | 0 | P | p | 10000 |
| DC1 | 1 | Q | q | 10001 |
| DC2 | 2 | R | r | 10010 |
| DC3 | 3 | S | s | 10011 |
| DC4 | 4 | T | t | 10100 |
| NAK | 5 | U | u | 10101 |
| SYN | 6 | V | v | 10110 |
| ETB | 7 | W | w | 10111 |
| CAN | 8 | X | x | 11000 |
| EM | 9 | Y | y | 11001 |
| SUB | : | Z | z | 11010 |
| ESC | ; | [ | { | 11011 |
| FS | < | \ | ||
| GS | = | ] | } | 11101 |
| RS | > | ^ | ~ | 11110 |
| US | ? | _ | DEL | 11111 |
Now in this table, look for ESC. It's in the first group, fifth from the bottom. It's in the first column so its group has bits '00', the row has bits '11011'. Now look on the same line, what else is there? Yep, the '[' character is there, be it in a different column:
10 11011 means [00 11011 means ESCSo when we you type CTRL+[ for ESC, you're asking for the equivalent of the character 11011 ([) out of the control set. Pressing CTRL simply sets all bits but the last 5 to zero in the character that you typed. You can imagine it as a bitwise AND.
10 11011 ([)
& 00 11111 (CTRL)
= 00 11011 (ESC)
This is why ^J types a newline, ^H types a backspace and ^I types a tab. This is why if you cat -A a Windows text file, it has ^M printed all over (meaning CR, because newlines are CR+LF on Windows).
You probably mean 28-31 (∟↔▲▼, or ␜␝␞␟)
Unless this is octal notation? But 0o60-0o63 in octal is 0123
If you have to prominently represent 10 things in binary, then it's neat to allocate slot of size 16 and pad the remaining 6 items. Which is to say it's neat to proceed from all zeroes:
x x x x 0 0 0 0
x x x x 0 0 0 1
x x x x 0 0 1 0
....
x x x x 1 1 1 1
It's more of a cause for hexadecimal notation than an effect of it.I honestly wouldn’t have thought anything of it if I hadn’t seen it written as `b ^ 0x20`.
If you want to use symbols for Mars and Venus for example,they are not in range(0,0x20). They are in Miscellanous Symbols block.