History

neauoire 9101bda743 (lz) *		2023-11-18 12:19:30 -08:00
..
.clang-format	(lz) example project	2023-11-14 18:54:31 -08:00
README.md	Starting encoding	2023-11-15 20:30:28 -08:00
blue.txt	Housekeeping	2023-11-15 14:35:57 -08:00
build.sh	(lz) Starting uxntal encoder	2023-11-18 11:16:27 -08:00
example.txt	Housekeeping	2023-11-15 14:35:57 -08:00
ulzdec.c	Housekeeping	2023-11-16 20:53:01 -08:00
ulzdec.tal	Cleaned up progress on encoder	2023-11-16 11:02:10 -08:00
ulzenc.c	(lz) *	2023-11-18 12:19:30 -08:00
ulzenc.tal	(lz) *	2023-11-18 12:19:30 -08:00

README.md

Simple Uxn LZ Format

Goals:

Anyone can implement it
Small source code size
Easy to implement from Uxn
Mildly better than RLE

Non-goals:

High compression ratio
High compression speed

Format

It's a stream of commands. The first byte encodes the first command. Read the commands from the input until there's no more input.

There are two commands. Literal and dictionary.

                      Byte 1             Byte 2+n
                 ┌─────────────────┐   ┌─────
Literal          │ 0 x x x x x x x │   │ ....
(Always 1 byte)  └─────────────────┘   └─────
                  Length of literal    Bytes to copy to output
                 (Adjust by adding 1)


                      Byte 1               Byte 2
Dictionary       ┌─────────────────┐  ┌─────────────────┐
(2 bytes version)│ 1 0 x x x x x x │  │ x x x x x x x x │
                 └─────────────────┘  └─────────────────┘
                      Length of           Offset into
                   dictionary match       dictionary
                 (Adjust by adding 4) (Adjust by adding 1)


                      Byte 1            Byte 2              Byte 3
Dictionary       ┌─────────────────┬─────────────────┐ ┌─────────────────┐
(3 bytes version)│ 1 1 x x x x x x │ x x x x x x x x │ │ x x x x x x x x │
                 └─────────────────┴─────────────────┘ └─────────────────┘
                       Length of dictionary match          Offset into
                          (Adjust by adding 4)             dictionary
                                                       (Adjust by adding 1)

The maximum dictionary history size is 256 bytes.
Dictionary offsets should be treated as the distance from the end of last byte that was output.
- Example: an offset of 0 means go back by 1 bytes into the history.
  - a b c d e f|g
- Example: an offset of 5 means go back by 6 bytes into the history.
  - a|b c d e f g

22:56 < neauoire> how large do I make the dictionary? 22:57 < cancel> yeah. and the dictionary is just the previous 256 bytes of the file. or, if you haven't progressed through 256 bytes yet, whatever you have 22:57 < cancel> so if you're 20 bytes into the file, your dictionary is the 20 bytes you've already processed 22:57 < cancel> if you're on the first byte of the file, your dictionary size is 0 22:57 < cancel> if you're on byte 500, the dictionary size is 256

22:58 < cancel> if your dictionary size is 0, you're definitely not gonna have a match 22:58 < cancel> if you don't have a match, you need to emit the literal command

22:58 < cancel> if your dictionary size is 0, you're definitely not gonna have a match 22:58 < cancel> if you don't have a match, you need to emit the literal command 22:58 < cancel> and then just slap some bytes down into the output 22:58 < cancel> but... how many? 22:59 < neauoire> it's designed to be stream right? 22:59 < neauoire> mhmm maybe not 22:59 < cancel> yeah, but you have to write the size of the literal first 22:59 < cancel> so... how big should the literal be? 22:59 < cancel> well, you don't know yet 23:00 < cancel> so, just write that the literal is 1 byte long, and then put that first byte of the file you were looking at for a match 23:01 < cancel> now, you're looking at the second byte of the file 23:01 < cancel> repeat the process above 23:01 < cancel> your dictionary is now size 1 23:01 < cancel> and it has that first character in it 23:01 < cancel> let's say your file is 'abcdefg' 23:01 < neauoire> yeah 23:01 < cancel> your dictionary is 'a' 23:01 < cancel> and the next character is 'b' 23:01 < cancel> well, there's no match in the dictionary. 23:02 < cancel> so you need to write a literal again... 23:02 < cancel> but the last thing you wrote was already a literal 23:02 < cancel> so just combine it with the previous literal

23:03 < cancel> ok 23:03 < cancel> you can make a 'compressed' file that doesn't actually compress 23:03 < cancel> it can just be all literals 23:03 < neauoire> it'll take me a while to even just accomplish this bit 23:03 < cancel> it will be bigger than the original input 23:03 < neauoire> ah yes 23:03 < cancel> but it will still be a usable file for the decompressor 23:03 < neauoire> let me try that