Huffman

This file is mirrored from the libtw2 documentation and is dual-licensed under MIT or APACHE.

Description

Teeworlds uses a huffman code to compress its network messages. The substitution works on bytes terminated by an EOF symbol (which means that there are 2^8+1 = 257 symbols). The specific substitutions can be seen in the appendix. The bitstream is padded with 0s until the number of bits is a multiple of 8, then the order of the bits in the individual bytes is reversed. This means a bitstream of 0123 4567 89ab cdef ends up as:

7654 3210 fecb da98 ...
|         |
|         second byte
first byte

NOTE: A bug in the reference implementation leads to an extra padding byte to be appended even if the bit stream already has a length that is a multiple of 8.

NOTE: The reference implementation ignores the end of the byte stream during decompression and just assumes it is followed by an endless stream of zeros.

Example

That means, in order to encode the bytes

00 01 00 02 00 80 00

we first have to append an EOF symbol

00 01 00 02 00 80 00 EOF

and then we substitute according to the substitution table in the appendix

1 0001 1 01000 1 00000 1 010100011101100
|  |   |  |    |  |    |  |
00 01  00 02   00 80   00 EOF

group the encoding by bytes

1000 1101 0001 0000 0101 0100 0111 0110 0
|         |         |         |         |
1st byte  2nd byte  3rd byte  4th byte  5th byte

and pad with unset bits to full bytes

1000 1101 0001 0000 0101 0100 0111 0110 0000 0000
                                         ^^^ ^^^^

and reverse the bits in each byte

1011 0001 0000 1000 0010 1010 0110 1110 0000 0000

We arrive at a representation that is two bytes shorter than the original (not only because it was carefully crafted).

To decompress this byte stream, you just reverse the order and stop producing bytes once you reach the EOF symbol.

Appendix

EOF: 010100011101100
00: 1
01: 0001
02: 01000
03: 01101000
04: 011110
05: 0110111
06: 01101100
07: 01110110
08: 00100
09: 0011001
0a: 0101111
0b: 01111111
0c: 0100111
0d: 011100
0e: 00101111
0f: 011101110
10: 0101011
11: 011001010
12: 0011101
13: 010101010
14: 00001111
15: 0111110110
16: 001011000
17: 001111100
18: 0000100
19: 001111001
1a: 010010000
1b: 000010110
1c: 010100000
1d: 0110011
1e: 01010011
1f: 0111010010
20: 001101011
21: 001100010
22: 010111011
23: 0110010111
24: 000011000
25: 010100010
26: 010110111
27: 010011010
28: 0110101
29: 01011100
2a: 010010001
2b: 0111111010
2c: 001011011
2d: 0110110101
2e: 0111010001
2f: 0110100110
30: 0111010111
31: 0111110010
32: 0111110100
33: 0101101011
34: 00111100010
35: 001110001011
36: 0111110101100
37: 010100011100
38: 001010111010
39: 000011100111
3a: 000010100101
3b: 001011100110
3c: 0111110000111
3d: 00101110000
3e: 0110010110100
3f: 010010101000
40: 01010010
41: 001111011
42: 0101101010
43: 0011000111
44: 0101100110
45: 0100110110
46: 0011110000
47: 0000111000
48: 0011111010
49: 00101001
4a: 0101010010
4b: 010101011
4c: 0011000011
4d: 0000101110
4e: 01100100001
4f: 01111110111
50: 01110101000
51: 01111101111
52: 01111100000
53: 0011011
54: 0010100001
55: 001101010
56: 01011101000
57: 01011000000
58: 0000110011
59: 01001010101
5a: 0101000111010
5b: 010110011110
5c: 0101110100111
5d: 001011100011
5e: 0011111011100
5f: 0101100111001
60: 0101100111000
61: 001110001010
62: 0110110100011
63: 0010101110001
64: 001110010001
65: 001110010000
66: 0010111001111
67: 0010111001110
68: 0111010110101
69: 0111010110100
6a: 0101100111011
6b: 0101100111010
6c: 01110101001000
6d: 001011100010
6e: 0111010110111
6f: 0110110100010
70: 0101110100110
71: 000011100110
72: 00001110010
73: 001011100101
74: 000010100100
75: 0110110100101
76: 01111101011010
77: 010100011101101
78: 0111010110110
79: 000010100111
7a: 0010101110000
7b: 0111010110001
7c: 0110110100100
7d: 001010101101
7e: 01110101001001
7f: 0011100011101
80: 00000
81: 0111111001
82: 001101001
83: 01111110110
84: 010110001
85: 01110111100
86: 010010100
87: 01110101011
88: 010110100
89: 01111100010
8a: 010011001
8b: 0010100000
8c: 001011010
8d: 01111110001
8e: 001111111
8f: 0011000110
90: 011001001
91: 01111110000
92: 001101000
93: 0000110010
94: 011000
95: 0101000110
96: 010011000
97: 01100100011
98: 001011001
99: 0100110111
9a: 010101000
9b: 01011101010
9c: 010010111
9d: 01111100011
9e: 001110011
9f: 0011100101
a0: 001111110
a1: 0101100001
a2: 0111110011
a3: 0111011111
a4: 011011011
a5: 001100000
a6: 011010010
a7: 0011000010
a8: 010110110
a9: 0110100111
aa: 010110010
ab: 0010101111
ac: 010010110
ad: 0000101111
ae: 001010110
af: 01111101110
b0: 00001101
b1: 001010100
b2: 000011101
b3: 01110101010
b4: 000010101
b5: 01100100010
b6: 010100001
b7: 01110100111
b8: 001010001
b9: 01110100110
ba: 001110000
bb: 01110111101
bc: 001111010
bd: 0111010000
be: 001011101
bf: 0101010011
c0: 010111010111
c1: 010010101101
c2: 001111000111
c3: 011011010011
c4: 001110010011
c5: 010111010110
c6: 011111000010
c7: 001111000110
c8: 011111010101
c9: 011011010000
ca: 001110010010
cb: 000010100110
cc: 010100011111
cd: 001111101101
ce: 011101010011
cf: 010010101100
d0: 010010101111
d1: 001111101100
d2: 011001011001
d3: 010111010010
d4: 000010100001
d5: 011001011000
d6: 010100011110
d7: 001010101100
d8: 00111000100
d9: 011001000001
da: 0111010110000
db: 001111101111
dc: 0101100000101
dd: 001010101111
de: 001010101110
df: 011001011011
e0: 001110001101
e1: 001010101001
e2: 0111010110011
e3: 001011100100
e4: 0011111011101
e5: 000010100000
e6: 011111010111
e7: 0101100000100
e8: 001010101000
e9: 0111010110010
ea: 001010111011
eb: 001010101011
ec: 0111110000110
ed: 011001000000
ee: 000010100011
ef: 001010101010
f0: 01111101011011
f1: 0011100011100
f2: 010110011111
f3: 010010101110
f4: 010010101001
f5: 000010100010
f6: 001110001100
f7: 0111110101001
f8: 01010001110111
f9: 0111110101000
fa: 0111010100101
fb: 001110001111
fc: 0110010110101
fd: 001010111001
fe: 010110000011
ff: 01001001