uxn-utils/cli/lz/README.md

Simple Uxn LZ Format
====================

Goals:

* Anyone can implement it
* Small source code size
* Easy to implement from Uxn
* Mildly better than RLE

Non-goals:

* High compression ratio
* High compression speed

Format
------

It's a stream of commands. The first byte encodes the first command. Read the commands from the input until there's no more input.

There are two commands. Literal and dictionary.


```
                      Byte 1             Byte 2+n
                 ┌─────────────────┐   ┌─────
Literal          │ 0 x x x x x x x │   │ ....
(Always 1 byte)  └─────────────────┘   └─────
                  Length of literal    Bytes to copy to output
                 (Adjust by adding 1)


                      Byte 1               Byte 2
Dictionary       ┌─────────────────┐  ┌─────────────────┐
(2 bytes version)│ 1 0 x x x x x x │  │ x x x x x x x x │
                 └─────────────────┘  └─────────────────┘
                      Length of           Offset into
                   dictionary match       dictionary
                 (Adjust by adding 4) (Adjust by adding 1)


                      Byte 1            Byte 2              Byte 3
Dictionary       ┌─────────────────┬─────────────────┐ ┌─────────────────┐
(3 bytes version)│ 1 1 x x x x x x │ x x x x x x x x │ │ x x x x x x x x │
                 └─────────────────┴─────────────────┘ └─────────────────┘
                       Length of dictionary match          Offset into
                          (Adjust by adding 4)             dictionary
                                                       (Adjust by adding 1)
```

* The maximum dictionary history size is 256 bytes.
* Dictionary offsets should be treated as the distance from the end of last byte that was output.
	* Example: an offset of 0 means go back by 1 bytes into the history.
		* `a b c d e f|g`
	* Example: an offset of 5 means go back by 6 bytes into the history.
		* `a|b c d e f g`


22:56 < neauoire> how large do I make the dictionary?
22:57 < cancel> yeah. and the dictionary is just the 
                previous 256 bytes of the file. or, if you 
                haven't progressed through 256 bytes yet, 
                whatever you have
22:57 < cancel> so if you're 20 bytes into the file, your 
                dictionary is the 20 bytes you've already 
                processed
22:57 < cancel> if you're on the first byte of the file, 
                your dictionary size is 0
22:57 < cancel> if you're on byte 500, the dictionary size 
                is 256

22:58 < cancel> if your dictionary size is 0, you're 
                definitely not gonna have a match
22:58 < cancel> if you don't have a match, you need to 
                emit the literal command

22:58 < cancel> if your dictionary size is 0, you're 
                definitely not gonna have a match
22:58 < cancel> if you don't have a match, you need to 
                emit the literal command
22:58 < cancel> and then just slap some bytes down into 
                the output
22:58 < cancel> but... how many?
22:59 < neauoire> it's designed to be stream right?
22:59 < neauoire> mhmm maybe not
22:59 < cancel> yeah, but you have to write the size of 
                the literal first
22:59 < cancel> so... how big should the literal be?
22:59 < cancel> well, you don't know yet
23:00 < cancel> so, just write that the literal is 1 byte 
                long, and then put that first byte of the 
                file you were looking at for a match
23:01 < cancel> now, you're looking at the second byte of 
                the file
23:01 < cancel> repeat the process above
23:01 < cancel> your dictionary is now size 1
23:01 < cancel> and it has that first character in it
23:01 < cancel> let's say your file is 'abcdefg'
23:01 < neauoire> yeah
23:01 < cancel> your dictionary is 'a'
23:01 < cancel> and the next character is 'b'
23:01 < cancel> well, there's no match in the dictionary.
23:02 < cancel> so you need to write a literal again...
23:02 < cancel> but the last thing you wrote was already a 
                literal
23:02 < cancel> so just combine it with the previous 
                literal

23:03 < cancel> ok
23:03 < cancel> you can make a 'compressed' file that 
                doesn't actually compress
23:03 < cancel> it can just be all literals
23:03 < neauoire> it'll take me a while to even just 
                  accomplish this bit
23:03 < cancel> it will be bigger than the original input
23:03 < neauoire> ah yes
23:03 < cancel> but it will still be a usable file for the 
                decompressor
23:03 < neauoire> let me try that
(lz) example project 2023-11-14 21:54:31 -05:00			`Simple Uxn LZ Format`
			`====================`

			`Goals:`

			`* Anyone can implement it`
			`* Small source code size`
			`* Easy to implement from Uxn`
			`* Mildly better than RLE`

			`Non-goals:`

			`* High compression ratio`
			`* High compression speed`

			`Format`
			`------`

			`It's a stream of commands. The first byte encodes the first command. Read the commands from the input until there's no more input.`

			`There are two commands. Literal and dictionary.`


			```
			`Byte 1 Byte 2+n`
			`┌─────────────────┐ ┌─────`
			`Literal │ 0 x x x x x x x │ │ ....`
			`(Always 1 byte) └─────────────────┘ └─────`
			`Length of literal Bytes to copy to output`
			`(Adjust by adding 1)`


			`Byte 1 Byte 2`
			`Dictionary ┌─────────────────┐ ┌─────────────────┐`
			`(2 bytes version)│ 1 0 x x x x x x │ │ x x x x x x x x │`
			`└─────────────────┘ └─────────────────┘`
			`Length of Offset into`
			`dictionary match dictionary`
			`(Adjust by adding 4) (Adjust by adding 1)`


			`Byte 1 Byte 2 Byte 3`
			`Dictionary ┌─────────────────┬─────────────────┐ ┌─────────────────┐`
			`(3 bytes version)│ 1 1 x x x x x x │ x x x x x x x x │ │ x x x x x x x x │`
			`└─────────────────┴─────────────────┘ └─────────────────┘`
			`Length of dictionary match Offset into`
			`(Adjust by adding 4) dictionary`
			`(Adjust by adding 1)`
			```

			`* The maximum dictionary history size is 256 bytes.`
			`* Dictionary offsets should be treated as the distance from the end of last byte that was output.`
			`* Example: an offset of 0 means go back by 1 bytes into the history.`
			* `a b c d e f\|g`
			`* Example: an offset of 5 means go back by 6 bytes into the history.`
Added build script 2023-11-14 23:12:28 -05:00			* `a\|b c d e f g`


			`22:56 < neauoire> how large do I make the dictionary?`
			`22:57 < cancel> yeah. and the dictionary is just the`
			`previous 256 bytes of the file. or, if you`
			`haven't progressed through 256 bytes yet,`
			`whatever you have`
			`22:57 < cancel> so if you're 20 bytes into the file, your`
			`dictionary is the 20 bytes you've already`
			`processed`
			`22:57 < cancel> if you're on the first byte of the file,`
			`your dictionary size is 0`
			`22:57 < cancel> if you're on byte 500, the dictionary size`
			`is 256`

			`22:58 < cancel> if your dictionary size is 0, you're`
			`definitely not gonna have a match`
			`22:58 < cancel> if you don't have a match, you need to`
			`emit the literal command`

			`22:58 < cancel> if your dictionary size is 0, you're`
			`definitely not gonna have a match`
			`22:58 < cancel> if you don't have a match, you need to`
			`emit the literal command`
			`22:58 < cancel> and then just slap some bytes down into`
			`the output`
			`22:58 < cancel> but... how many?`
			`22:59 < neauoire> it's designed to be stream right?`
			`22:59 < neauoire> mhmm maybe not`
			`22:59 < cancel> yeah, but you have to write the size of`
			`the literal first`
			`22:59 < cancel> so... how big should the literal be?`
			`22:59 < cancel> well, you don't know yet`
			`23:00 < cancel> so, just write that the literal is 1 byte`
			`long, and then put that first byte of the`
			`file you were looking at for a match`
			`23:01 < cancel> now, you're looking at the second byte of`
			`the file`
			`23:01 < cancel> repeat the process above`
			`23:01 < cancel> your dictionary is now size 1`
			`23:01 < cancel> and it has that first character in it`
			`23:01 < cancel> let's say your file is 'abcdefg'`
			`23:01 < neauoire> yeah`
			`23:01 < cancel> your dictionary is 'a'`
			`23:01 < cancel> and the next character is 'b'`
			`23:01 < cancel> well, there's no match in the dictionary.`
			`23:02 < cancel> so you need to write a literal again...`
			`23:02 < cancel> but the last thing you wrote was already a`
			`literal`
			`23:02 < cancel> so just combine it with the previous`
			`literal`

			`23:03 < cancel> ok`
			`23:03 < cancel> you can make a 'compressed' file that`
			`doesn't actually compress`
			`23:03 < cancel> it can just be all literals`
			`23:03 < neauoire> it'll take me a while to even just`
			`accomplish this bit`
			`23:03 < cancel> it will be bigger than the original input`
			`23:03 < neauoire> ah yes`
			`23:03 < cancel> but it will still be a usable file for the`
			`decompressor`
			`23:03 < neauoire> let me try that`