uxn-utils/cli/lz/explanation.md

2.4 KiB

Simple Uxn LZ Format

Goals:

  • Anyone can implement it
  • Small source code size
  • Easy to implement from Uxn
  • Mildly better than RLE

Non-goals:

  • High compression ratio
  • High compression speed

Format

It's a stream of commands. The first byte encodes the first command. Read the commands from the input until there's no more input.

There are two commands. Literal and dictionary.

                      Byte 1             Byte 2+n
                 ┌─────────────────┐   ┌─────
Literal          │ 0 x x x x x x x │   │ ....
(Always 1 byte)  └─────────────────┘   └─────
                  Length of literal    Bytes to copy to output
                 (Adjust by adding 1)


                      Byte 1               Byte 2
Dictionary       ┌─────────────────┐  ┌─────────────────┐
(2 bytes version)│ 1 0 x x x x x x │  │ x x x x x x x x │
                 └─────────────────┘  └─────────────────┘
                      Length of           Offset into
                   dictionary match       dictionary
                 (Adjust by adding 4) (Adjust by adding 1)


                      Byte 1            Byte 2              Byte 3
Dictionary       ┌─────────────────┬─────────────────┐ ┌─────────────────┐
(3 bytes version)│ 1 1 x x x x x x │ x x x x x x x x │ │ x x x x x x x x │
                 └─────────────────┴─────────────────┘ └─────────────────┘
                       Length of dictionary match          Offset into
                          (Adjust by adding 4)             dictionary
                                                       (Adjust by adding 1)
  • The maximum dictionary history size is 256 bytes.
  • Dictionary offsets should be treated as the distance from the end of last byte that was output.
    • Example: an offset of 0 means go back by 1 bytes into the history.
      • a b c d e f|g
    • Example: an offset of 5 means go back by 6 bytes into the history.
      • a|b c d e f g