Updated file format specification. It changes the suffix

of the new format to .xz and removes the recently added
LZMA filter.
This commit is contained in:
Lasse Collin 2008-09-27 19:11:02 +03:00
parent 1dcecfb09b
commit c6ca26eef7
1 changed files with 32 additions and 93 deletions

View File

@ -1,6 +1,6 @@
The .lzma File Format The .xz File Format
--------------------- -------------------
0. Preface 0. Preface
0.1. Copyright Notices 0.1. Copyright Notices
@ -8,7 +8,7 @@ The .lzma File Format
1. Conventions 1. Conventions
1.1. Byte and Its Representation 1.1. Byte and Its Representation
1.2. Multibyte Integers 1.2. Multibyte Integers
2. Overall Structure of .lzma File 2. Overall Structure of .xz File
2.1. Stream 2.1. Stream
2.1.1. Stream Header 2.1.1. Stream Header
2.1.1.1. Header Magic Bytes 2.1.1.1. Header Magic Bytes
@ -43,11 +43,10 @@ The .lzma File Format
5.1. Alignment 5.1. Alignment
5.2. Security 5.2. Security
5.3. Filters 5.3. Filters
5.3.1. LZMA 5.3.1. LZMA2
5.3.2. LZMA2 5.3.2. Branch/Call/Jump Filters for Executables
5.3.3. Branch/Call/Jump Filters for Executables 5.3.3. Delta
5.3.4. Delta 5.3.3.1. Format of the Encoded Output
5.3.4.1. Format of the Encoded Output
5.4. Custom Filter IDs 5.4. Custom Filter IDs
5.4.1. Reserved Custom Filter ID Ranges 5.4.1. Reserved Custom Filter ID Ranges
6. Cyclic Redundancy Checks 6. Cyclic Redundancy Checks
@ -56,10 +55,10 @@ The .lzma File Format
0. Preface 0. Preface
This document describes the .lzma file format (filename suffix This document describes the .xz file format (filename suffix
`.lzma', MIME type `application/x-lzma'). It is intended that `.xz', MIME type `application/x-xz'). It is intended that this
this format replace the format used by the LZMA_Alone tool this format replace the old .lzma format used by LZMA SDK and
included in LZMA SDK up to and including version 4.57. LZMA Utils.
IMPORTANT: The version described in this document is a IMPORTANT: The version described in this document is a
draft, NOT a final, official version. Changes draft, NOT a final, official version. Changes
@ -86,7 +85,7 @@ The .lzma File Format
0.2. Changes 0.2. Changes
Last modified: 2008-09-07 10:20+0300 Last modified: 2008-09-24 21:05+0300
(A changelog will be kept once the first official version (A changelog will be kept once the first official version
is made.) is made.)
@ -205,7 +204,7 @@ The .lzma File Format
} }
2. Overall Structure of .lzma File 2. Overall Structure of .xz File
+========+================+========+================+ +========+================+========+================+
| Stream | Stream Padding | Stream | Stream Padding | ... | Stream | Stream Padding | Stream | Stream Padding | ...
@ -243,9 +242,9 @@ The .lzma File Format
The same limit applies to the total amount of uncompressed The same limit applies to the total amount of uncompressed
data stored in a Stream. data stored in a Stream.
If an implementation supports handling .lzma files with If an implementation supports handling .xz files with multiple
multiple concatenated Streams, it may apply the above limits concatenated Streams, it may apply the above limits to the file
to the file as a whole instead of limiting per Stream basis. as a whole instead of limiting per Stream basis.
2.1.1. Stream Header 2.1.1. Stream Header
@ -262,15 +261,15 @@ The .lzma File Format
Using a C array and ASCII: Using a C array and ASCII:
const uint8_t HEADER_MAGIC[6] const uint8_t HEADER_MAGIC[6]
= { 0xFF, 'L', 'Z', 'M', 'A', 0x00 }; = { 0xFD, '7', 'z', 'X', 'Z', 0x00 };
In plain hexadecimal: In plain hexadecimal:
FF 4C 5A 4D 41 00 FD 37 7A 58 5A 00
Notes: Notes:
- The first byte (0xFF) was chosen so that the files cannot - The first byte (0xFD) was chosen so that the files cannot
be erroneously detected as being in LZMA_Alone format, in be erroneously detected as being in .lzma format, in which
which the first byte is in the range [0x00, 0xE0]. the first byte is in the range [0x00, 0xE0].
- The sixth byte (0x00) was chosen to prevent applications - The sixth byte (0x00) was chosen to prevent applications
from misdetecting the file as a text file. from misdetecting the file as a text file.
@ -704,15 +703,15 @@ The .lzma File Format
PowerPC executable files in the archive stream start at PowerPC executable files in the archive stream start at
offsets that are multiples of four bytes. offsets that are multiples of four bytes.
Some filters, for example LZMA, can be configured to take Some filters, for example LZMA2, can be configured to take
advantage of specified alignment of input data. Note that advantage of specified alignment of input data. Note that
taking advantage of aligned input can be benefical also when taking advantage of aligned input can be benefical also when
a filter is not the first filter in the chain. For example, a filter is not the first filter in the chain. For example,
if you compress PowerPC executables, you may want to use the if you compress PowerPC executables, you may want to use the
PowerPC filter and chain that with the LZMA filter. Because not PowerPC filter and chain that with the LZMA2 filter. Because
only the input but also the output alignment of the PowerPC not only the input but also the output alignment of the PowerPC
filter is four bytes, it is now benefical to set LZMA settings filter is four bytes, it is now benefical to set LZMA2 settings
so that the LZMA encoder can take advantage of its so that the LZMA2 encoder can take advantage of its
four-byte-aligned input data. four-byte-aligned input data.
The output of the last filter in the chain is stored to the The output of the last filter in the chain is stored to the
@ -770,78 +769,18 @@ The .lzma File Format
5.3. Filters 5.3. Filters
5.3.1. LZMA 5.3.1. LZMA2
LZMA (Lempel-Ziv-Markov chain-Algorithm) is a general-purporse LZMA (Lempel-Ziv-Markov chain-Algorithm) is a general-purporse
compression algorithm with high compression ratio and fast compression algorithm with high compression ratio and fast
decompression. LZMA is based on LZ77 and range coding decompression. LZMA is based on LZ77 and range coding
algorithms. algorithms.
Filter ID: 0x20
Size of Filter Properties: 5 bytes
Changes size of data: Yes
Allow as a non-last filter: No
Allow as the last filter: Yes
Preferred alignment:
Input data: Adjustable to 1/2/4/8/16 byte(s)
Output data: 1 byte
At the time of writing, there is no other documentation about
how LZMA works than the source code in LZMA SDK. Once such
documentation gets written, it will probably be published as
a separate document, because including the documentation here
would lengthen this document considerably.
The format of the Filter Properties field is as follows:
+-----------------+----+----+----+----+
| LZMA Properties | Dictionary Size |
+-----------------+----+----+----+----+
The LZMA Properties field contains three properties. An
abbreviation is given in parentheses, followed by the value
range of the property. The field consists of
1) the number of literal context bits (lc, [0, 4]);
2) the number of literal position bits (lp, [0, 4]); and
3) the number of position bits (pb, [0, 4]).
In addition to above ranges, the sum of lc and lp must not
exceed four. Note that this limit didn't exist in the old
LZMA_Alone format, which allowed lc to be in the range [0, 8].
The properties are encoded using the following formula:
LZMA Properties = (pb * 5 + lp) * 9 + lc
The following C code illustrates a straightforward way to
decode the properties:
uint8_t lc, lp, pb;
uint8_t prop = get_lzma_properties();
if (prop > (4 * 5 + 4) * 9 + 8)
return LZMA_PROPERTIES_ERROR;
pb = prop / (9 * 5);
prop -= pb * 9 * 5;
lp = prop / 9;
lc = prop - lp * 9;
if (lc + lp > 4)
return LZMA_PROPERTIES_ERROR;
Dictionary Size is encoded as unsigned 32-bit little endian
integer.
5.3.2. LZMA2
LZMA2 is an extensions on top of the original LZMA. LZMA2 uses LZMA2 is an extensions on top of the original LZMA. LZMA2 uses
LZMA internally, but adds support for flushing the encoder, LZMA internally, but adds support for flushing the encoder,
uncompressed chunks, eases stateful decoder implementations, uncompressed chunks, eases stateful decoder implementations,
and improves support for multithreading. For most uses, it is and improves support for multithreading. Thus, the plain LZMA
recommended to use LZMA2 instead of LZMA. will not be supported in this file format.
Filter ID: 0x21 Filter ID: 0x21
Size of Filter Properties: 1 byte Size of Filter Properties: 1 byte
@ -896,7 +835,7 @@ The .lzma File Format
} }
5.3.3. Branch/Call/Jump Filters for Executables 5.3.2. Branch/Call/Jump Filters for Executables
These filters convert relative branch, call, and jump These filters convert relative branch, call, and jump
instructions to their absolute counterparts in executable instructions to their absolute counterparts in executable
@ -936,7 +875,7 @@ The .lzma File Format
the Subblock filter. the Subblock filter.
5.3.4. Delta 5.3.3. Delta
The Delta filter may increase compression ratio when the value The Delta filter may increase compression ratio when the value
of the next byte correlates with the value of an earlier byte of the next byte correlates with the value of an earlier byte
@ -957,7 +896,7 @@ The .lzma File Format
distance of 1 byte and 0xFF distance of 256 bytes. distance of 1 byte and 0xFF distance of 256 bytes.
5.3.4.1. Format of the Encoded Output 5.3.3.1. Format of the Encoded Output
The code below illustrates both encoding and decoding with The code below illustrates both encoding and decoding with
the Delta filter. the Delta filter.