2023-11-14 21:54:31 -05:00
|
|
|
Simple Uxn LZ Format
|
|
|
|
====================
|
|
|
|
|
|
|
|
Goals:
|
|
|
|
|
|
|
|
* Anyone can implement it
|
|
|
|
* Small source code size
|
|
|
|
* Easy to implement from Uxn
|
|
|
|
* Mildly better than RLE
|
|
|
|
|
|
|
|
Non-goals:
|
|
|
|
|
|
|
|
* High compression ratio
|
|
|
|
* High compression speed
|
|
|
|
|
|
|
|
Format
|
|
|
|
------
|
|
|
|
|
|
|
|
It's a stream of commands. The first byte encodes the first command. Read the commands from the input until there's no more input.
|
|
|
|
|
|
|
|
There are two commands. Literal and dictionary.
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
Byte 1 Byte 2+n
|
|
|
|
┌─────────────────┐ ┌─────
|
|
|
|
Literal │ 0 x x x x x x x │ │ ....
|
|
|
|
(Always 1 byte) └─────────────────┘ └─────
|
|
|
|
Length of literal Bytes to copy to output
|
|
|
|
(Adjust by adding 1)
|
|
|
|
|
|
|
|
|
|
|
|
Byte 1 Byte 2
|
|
|
|
Dictionary ┌─────────────────┐ ┌─────────────────┐
|
|
|
|
(2 bytes version)│ 1 0 x x x x x x │ │ x x x x x x x x │
|
|
|
|
└─────────────────┘ └─────────────────┘
|
|
|
|
Length of Offset into
|
|
|
|
dictionary match dictionary
|
|
|
|
(Adjust by adding 4) (Adjust by adding 1)
|
|
|
|
|
|
|
|
|
|
|
|
Byte 1 Byte 2 Byte 3
|
|
|
|
Dictionary ┌─────────────────┬─────────────────┐ ┌─────────────────┐
|
|
|
|
(3 bytes version)│ 1 1 x x x x x x │ x x x x x x x x │ │ x x x x x x x x │
|
|
|
|
└─────────────────┴─────────────────┘ └─────────────────┘
|
|
|
|
Length of dictionary match Offset into
|
|
|
|
(Adjust by adding 4) dictionary
|
|
|
|
(Adjust by adding 1)
|
|
|
|
```
|
|
|
|
|
|
|
|
* The maximum dictionary history size is 256 bytes.
|
|
|
|
* Dictionary offsets should be treated as the distance from the end of last byte that was output.
|
|
|
|
* Example: an offset of 0 means go back by 1 bytes into the history.
|
|
|
|
* `a b c d e f|g`
|
|
|
|
* Example: an offset of 5 means go back by 6 bytes into the history.
|
2023-11-14 23:12:28 -05:00
|
|
|
* `a|b c d e f g`
|
|
|
|
|
|
|
|
|
|
|
|
22:56 < neauoire> how large do I make the dictionary?
|
|
|
|
22:57 < cancel> yeah. and the dictionary is just the
|
|
|
|
previous 256 bytes of the file. or, if you
|
|
|
|
haven't progressed through 256 bytes yet,
|
|
|
|
whatever you have
|
|
|
|
22:57 < cancel> so if you're 20 bytes into the file, your
|
|
|
|
dictionary is the 20 bytes you've already
|
|
|
|
processed
|
|
|
|
22:57 < cancel> if you're on the first byte of the file,
|
|
|
|
your dictionary size is 0
|
|
|
|
22:57 < cancel> if you're on byte 500, the dictionary size
|
|
|
|
is 256
|
|
|
|
|
|
|
|
22:58 < cancel> if your dictionary size is 0, you're
|
|
|
|
definitely not gonna have a match
|
|
|
|
22:58 < cancel> if you don't have a match, you need to
|
|
|
|
emit the literal command
|
|
|
|
|
|
|
|
22:58 < cancel> if your dictionary size is 0, you're
|
|
|
|
definitely not gonna have a match
|
|
|
|
22:58 < cancel> if you don't have a match, you need to
|
|
|
|
emit the literal command
|
|
|
|
22:58 < cancel> and then just slap some bytes down into
|
|
|
|
the output
|
|
|
|
22:58 < cancel> but... how many?
|
|
|
|
22:59 < neauoire> it's designed to be stream right?
|
|
|
|
22:59 < neauoire> mhmm maybe not
|
|
|
|
22:59 < cancel> yeah, but you have to write the size of
|
|
|
|
the literal first
|
|
|
|
22:59 < cancel> so... how big should the literal be?
|
|
|
|
22:59 < cancel> well, you don't know yet
|
|
|
|
23:00 < cancel> so, just write that the literal is 1 byte
|
|
|
|
long, and then put that first byte of the
|
|
|
|
file you were looking at for a match
|
|
|
|
23:01 < cancel> now, you're looking at the second byte of
|
|
|
|
the file
|
|
|
|
23:01 < cancel> repeat the process above
|
|
|
|
23:01 < cancel> your dictionary is now size 1
|
|
|
|
23:01 < cancel> and it has that first character in it
|
|
|
|
23:01 < cancel> let's say your file is 'abcdefg'
|
|
|
|
23:01 < neauoire> yeah
|
|
|
|
23:01 < cancel> your dictionary is 'a'
|
|
|
|
23:01 < cancel> and the next character is 'b'
|
|
|
|
23:01 < cancel> well, there's no match in the dictionary.
|
|
|
|
23:02 < cancel> so you need to write a literal again...
|
|
|
|
23:02 < cancel> but the last thing you wrote was already a
|
|
|
|
literal
|
|
|
|
23:02 < cancel> so just combine it with the previous
|
|
|
|
literal
|
|
|
|
|
|
|
|
23:03 < cancel> ok
|
|
|
|
23:03 < cancel> you can make a 'compressed' file that
|
|
|
|
doesn't actually compress
|
|
|
|
23:03 < cancel> it can just be all literals
|
|
|
|
23:03 < neauoire> it'll take me a while to even just
|
|
|
|
accomplish this bit
|
|
|
|
23:03 < cancel> it will be bigger than the original input
|
|
|
|
23:03 < neauoire> ah yes
|
|
|
|
23:03 < cancel> but it will still be a usable file for the
|
|
|
|
decompressor
|
|
|
|
23:03 < neauoire> let me try that
|