Skip to content

Commit 2a6de01

Browse files
committed
Update
1 parent e20a10a commit 2a6de01

File tree

2 files changed

+112
-1
lines changed

2 files changed

+112
-1
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
dist
12
ubase
23
.DS_Store
34
.idea

README.md

Lines changed: 111 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,12 @@ You can also write hex/octal/binary/your own format by hand:
3030
$ echo C2A7 | aces -d 0123456789ABCDEF
3131
$ echo .+=. | aces -d ./+= # try this!
3232
```
33+
Convert binary to hex:
34+
```shell
35+
$ echo 01001010 | aces -d 01 | aces 0123456789ABCDEF
36+
```
3337

38+
_Also check out the examples!_
3439
## Installing
3540

3641
### macOS or Linux with linuxbrew
@@ -39,4 +44,109 @@ brew install quackduck/tap/aces
3944
```
4045

4146
### Other platforms
42-
Head over to [releases](https://github.com/quackduck/aces/releases) and download the latest binary!
47+
Head over to [releases](https://github.com/quackduck/aces/releases) and download the latest binary!
48+
49+
## Usage
50+
```yaml
51+
Aces - Encode in any character set
52+
53+
Usage:
54+
aces <charset> - encode data from STDIN into <charset>
55+
aces -d/--decode <charset> - decode data from STDIN from <charset>
56+
aces -h/--help - print this help message
57+
58+
Aces reads from STDIN for your data and outputs the result to STDOUT. The charset length must be
59+
a power of 2. While decoding, bytes not in the charset are ignored. Aces does not add any padding.
60+
```
61+
## Examples
62+
```shell
63+
echo hello world | aces "<>(){}[]" | aces --decode "<>(){}[]" # basic usage
64+
echo matthew stanciu | aces HhAa | say # make funny sounds (macOS)
65+
aces " X" < /bin/echo # see binaries visually
66+
echo 0100100100100001 | aces -d 01 | aces 01234567 # convert bases
67+
echo Calculus | aces 01 # what's stuff in binary?
68+
echo Aces™ | base64 | aces -d
69+
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ # even decode base64
70+
```
71+
72+
## How does it work?
73+
To answer that, we need to know how encoding works in general. Let's take the example of Base64.
74+
75+
### Base64
76+
```text
77+
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
78+
```
79+
That is the Base64 character set. As you may expect, it's 64 characters long.
80+
81+
Let's say we want to somehow represent these two bytes in those 64 characters:
82+
```text
83+
00001001 10010010 # 09 92 in hex
84+
```
85+
To do that, Base64 does something very smart: it uses the bits, interpreted as a number, as indexes of the character set.
86+
87+
To explain what that means, let's consider what possible values 6 bits can represent: `000000` (decimal 0) to `111111` (decimal 63).
88+
Since 0 to 63 is the exact range of indices that can be used with the 64 element character set, we'll group our 8 bit chunks (bytes) of data in 6 bit chunks (to use as indices):
89+
```text
90+
000010 011001 0010
91+
```
92+
`000010` is 2 in decimal, so by using it as an index of the character set, Base64 adds `C` (index 2) to the result.
93+
94+
`011001` is 16 + 8 + 1 = 25 in decimal, so Base64 appends `Z` (index 25) to the result.
95+
96+
You may have spotted a problem with the next chunk - it's only 4 bits long!
97+
98+
To get around this, Base64 pretends it's a 6 bit chunk and simply appends how many zeros are needed:
99+
```
100+
0010 + 00 => 001000
101+
```
102+
`001000` is 8 in decimal, so Base64 appends `I` to the result
103+
104+
But then, on the decoding side, how do you know where real data ends and where the pretend data starts?
105+
106+
It turns out that we don't need to do anything. On the decoding side, we know that the decoded data _has_ to be a multiple of 8 bits. So, the decoder ignores the bits which make the output _not_ a multiple of 8 bits, which will always be the extra bits we added.
107+
108+
Finally, encoding `00001001 10010010` to Base64 should result in `CZI`
109+
110+
Try this in your terminal with the real Base64!
111+
```shell
112+
echo -n -e \\x09\\x92 | base64 # base64 also adds a "=" character called "padding" to fit to a standard input length to output length ratio
113+
```
114+
115+
### Aces
116+
117+
Now we generalize this to all character sets.
118+
119+
Generalizing the character set is easy, we just switch out the characters of the array storing the character set.
120+
121+
Changing the length of the character set is slightly harder. For every character set length, we need to figure out how many bits the chunked data should have.
122+
123+
In the Base64 example, the chunk length (let's call it that) was 6. The character set length was 64.
124+
125+
[comment]: <> (Let's do another example: in octal, the character set length is 8 and the chunk length will be 3 &#40;`000` to `111` = 0 to 7&#41;)
126+
127+
[comment]: <> (For a character set length of 4, we'd need a chunk length of 2 &#40;`00` to `11` is 0 to 3&#41;)
128+
129+
[comment]: <> (```text)
130+
131+
[comment]: <> (set len => chunk len)
132+
133+
[comment]: <> ( 4 => 2)
134+
135+
[comment]: <> ( 8 => 3)
136+
137+
[comment]: <> ( 64 => 6)
138+
139+
[comment]: <> (```)
140+
It looks like `2^(chunk len) = set len`. We can prove this is true with this observation:
141+
142+
Every bit can either be 1 or 0, so the total possible values of a certain number of bits will just be `2^(number of bits)` (if you need further proof, observe that every bit we add doubles the total possibilities since there's an additional choice: the new bit being 0 or the new bit being 1)
143+
144+
The total possible values is the length of the character set (of course, since we need the indices to cover all the characters of the set)
145+
146+
So, to find the number of bits the chunked data should have, we just do `log2(character set length)`. Then, we divide the bytes into chunks of that many bits (which was pretty hard to implement: knowing when to read more bytes, crossing over into the next byte to fetch more bits, etc, etc.), use those bits as indices for the user-supplied character set, and print the result. Easy! (Nope, this is the work of several showers and a lot of late night pondering :)
147+
148+
149+
150+
151+
152+

0 commit comments

Comments
 (0)