216 lines
9.5 KiB
Plaintext
216 lines
9.5 KiB
Plaintext
|
|
XZ Utils FAQ
|
|
============
|
|
|
|
Q: What are LZMA, LZMA Utils, lzma, .lzma, liblzma, LZMA SDK, LZMA_Alone,
|
|
7-Zip and p7zip?
|
|
|
|
A: LZMA stands for Lempel-Ziv-Markov chain-Algorithm. LZMA is the name
|
|
of the compression algorithm designed by Igor Pavlov. He is the author
|
|
of 7-Zip, which is a great LGPL'd compression tool for Microsoft
|
|
Windows operating systems. In addition to 7-Zip itself, also LZMA SDK
|
|
is available on the website of 7-Zip. LZMA SDK contains LZMA
|
|
implementations in C++, Java and C#. The C++ version is the original
|
|
implementation which is used also in 7-Zip itself.
|
|
|
|
Excluding the unrar plugin, 7-Zip is free software (free as in
|
|
freedom). Thanks to this, it was possible to port it to POSIX
|
|
platforms. The port was done and is maintained by myspace (TODO:
|
|
myspace's real name?). p7zip is a port of 7-Zip's command line version;
|
|
p7zip doesn't include the 7-Zip's GUI.
|
|
|
|
In POSIX world, users are used to gzip and bzip2 command line tools.
|
|
Developers know APIs of zlib and libbzip2. LZMA Utils try to ease
|
|
adoption of LZMA on free operating systems by providing a compression
|
|
library and a set of command line tools. The library is called liblzma.
|
|
It provides a zlib-like API making it easy to adapt LZMA compression in
|
|
existing applications. The main command line tool is known as lzma,
|
|
whose command line syntax is very similar to that of gzip and bzip2.
|
|
|
|
The original command line tool from LZMA SDK (lzma.exe) was found from
|
|
a directory called LZMA_Alone in the LZMA SDK. It used a simple header
|
|
format in .lzma files. This format was also used by LZMA Utils up to
|
|
and including 4.32.x. In LZMA Utils documentation, LZMA_Alone refers
|
|
to both the file format and the command line tool from LZMA SDK.
|
|
|
|
Because of various limitations of the LZMA_Alone file format, a new
|
|
file format was developed. Extending some existing format such as .gz
|
|
used by gzip was considered, but these formats were found to be too
|
|
limited. The filename suffix for the new .lzma format is `.lzma'. The
|
|
same suffix is also used for files in the LZMA_Alone format. To make
|
|
the transition to the new format as transparent as possible, LZMA Utils
|
|
support both the new and old formats transparently.
|
|
|
|
7-Zip and LZMA SDK: <http://7-zip.org/>
|
|
p7zip: <http://p7zip.sourceforge.net/>
|
|
LZMA Utils: <http://tukaani.org/lzma/>
|
|
|
|
|
|
Q: What LZMA implementations there are available?
|
|
|
|
A: LZMA SDK contains implementations in C++, Java and C#. The C++ version
|
|
is the original implementation which is part of 7-Zip. LZMA SDK
|
|
contains also a small LZMA decoder in C.
|
|
|
|
A port of LZMA SDK to Pascal was made by Alan Birtles
|
|
<http://www.birtles.org.uk/programming/>. It should work with
|
|
multiple Pascal programming language implementations.
|
|
|
|
LZMA Utils includes liblzma, which is directly based on LZMA SDK.
|
|
liblzma is written in C (C99, not C89). In contrast to C++ callback
|
|
API used by LZMA SDK, liblzma uses zlib-like stateful C API. I do not
|
|
want to comment whether both/former/latter/neither API(s) are good or
|
|
bad. The only reason to implement a zlib-like API was, that many
|
|
developers are already familiar with zlib, and very many applications
|
|
already use zlib. Having a similar API makes it easier to include LZMA
|
|
support in existing applications.
|
|
|
|
See also <http://en.wikipedia.org/wiki/LZMA#External_links>.
|
|
|
|
|
|
Q: Which file formats are supported by LZMA Utils?
|
|
|
|
A: Even when the raw LZMA stream is always the same, it can be wrapped
|
|
in different container formats. The preferred format is the new .lzma
|
|
format. It has magic bytes (the first six bytes: 0xFF 'L' 'Z' 'M'
|
|
'A' 0x00). The format supports chaining up to seven filters, splitting
|
|
data to multiple blocks for easier multi-threading and rough
|
|
random-access reading. The file integrity is verified using CRC32,
|
|
CRC64, or SHA256, and by verifying the uncompressed size of the file.
|
|
|
|
LZMA SDK includes a tool called LZMA_Alone. It supports uses a
|
|
primitive header which includes only the mandatory stream information
|
|
required by the LZMA decoder. This format can be both read and
|
|
written by liblzma and the command line tool (use --format=alone to
|
|
create such files).
|
|
|
|
.7z is the native archive format used by 7-Zip. This format is not
|
|
supported by liblzma, and probably will never be supported. You
|
|
should use e.g. p7zip to extract .7z files.
|
|
|
|
It is possible to implement custom file formats by using raw filter
|
|
mode in liblzma. In this mode the application needs to store the filter
|
|
properties and provide them to liblzma before starting to uncompress
|
|
the data.
|
|
|
|
|
|
Q: How can I identify files containing LZMA compressed data?
|
|
|
|
A: The preferred filename suffix for .lzma files is `.lzma'. `.tar.lzma'
|
|
may be abbreviated to `.tlz'. The same suffixes are used for files in
|
|
LZMA_Alone format. In practice this should be no problem since tools
|
|
included in LZMA Utils support both formats transparently.
|
|
|
|
Checking the magic bytes is easy way to detect files in the new .lzma
|
|
format (the first six bytes: 0xFF 'L' 'Z' 'M' 'A' 0x00). The "file"
|
|
command version FIXME contains magic strings for this format.
|
|
|
|
The old LZMA_Alone format has no magic bytes. Its header cannot contain
|
|
arbitrary bytes, thus it is possible to make a guess. Unfortunately the
|
|
guessing is usually too hard to be reliable, so don't try it unless you
|
|
are desperate.
|
|
|
|
|
|
Q: Does the lzma command line tool support sparse files?
|
|
|
|
A: Sparse files can (of course) be compressed like normal files, but
|
|
uncompression will not restore sparseness of the file. Use an archiver
|
|
tool to take care of sparseness before compressing the data with lzma.
|
|
|
|
The reason for this is that archiver tools handle files, while
|
|
compression tools handle streams or buffers. Being a sparse file is
|
|
a property of the file on the disk, not a property of the stream or
|
|
buffer.
|
|
|
|
|
|
Q: Can I recover parts of a broken LZMA file (e.g. corrupted CD-R)?
|
|
|
|
A: With LZMA_Alone and single-block .lzma files, you can uncompress the
|
|
file until you hit the first broken byte. The data after the broken
|
|
position is lost. LZMA relies on the uncompression history, and if
|
|
bytes are missing in the middle of the file, it is impossible to
|
|
reliably continue after the broken section.
|
|
|
|
With multi-block .lzma files it may be possible to locale the next
|
|
block in the file and continue decoding there. A limited recovery
|
|
tool for this kind of situations is planned.
|
|
|
|
|
|
Q: Is LZMA patented?
|
|
|
|
A: No, the authors are not aware of any patents that could affect LZMA.
|
|
However, due to nature of software patents, the authors cannot
|
|
guarantee, that LZMA isn't affected by any third party patent.
|
|
|
|
|
|
Q: Where can I find documentation about how LZMA works as an algorithm?
|
|
|
|
A: Read the source code, Luke. There is no documentation about LZMA
|
|
internals. It is possible that Igor Pavlov is the only person on
|
|
the Earth that completely knows and understands the algorithm.
|
|
|
|
You could begin by downloading LZMA SDK, and start reading from
|
|
the LZMA decoder to get some idea about the bitstream format.
|
|
Before you begin, you should know the basics of LZ77 and
|
|
range coding algorithms. LZMA is based on LZ77, but LZMA is
|
|
*a lot* more complex. Range coding is used to compress the
|
|
final bitstream like Huffman coding is used in Deflate.
|
|
|
|
|
|
Q: What are filters?
|
|
|
|
A: In context of .lzma files, a filter means an implementation of a
|
|
compression algorithm. The primary filter is LZMA, which is why
|
|
the names of the tools contain the letters LZMA.
|
|
|
|
liblzma and the new .lzma format support also other filters than LZMA.
|
|
There are different types of filters, which are suitable for different
|
|
types of data. Thus, to select the optimal filter and settings, the
|
|
type of the input data being compressed needs to be known.
|
|
|
|
Some filters are most useful when combined with another filter like
|
|
LZMA. These filters increase redundancy in the data, without changing
|
|
the size of the data, by taking advantage of properties specific to
|
|
the data being compressed.
|
|
|
|
So far, all the filters are always reversible. That is, no matter what
|
|
data you pass to a filter encoder, it can be always defiltered back to
|
|
the original form. Because of this, it is safe to compress for example
|
|
a software package that contains other file types than executables
|
|
using a filter specific to the architechture of the package being
|
|
compressed.
|
|
|
|
The old LZMA_Alone format supports only the LZMA filter.
|
|
|
|
|
|
Q: I cannot find BCJ and BCJ2 filters. Don't they exist in liblzma?
|
|
|
|
A: BCJ filter is called "x86" in liblzma. BCJ2 is not included,
|
|
because it requires using more than one encoded output stream.
|
|
|
|
|
|
Q: Can I use LZMA in proprietary, non-free applications?
|
|
|
|
A: Yes. See the file COPYING for details.
|
|
|
|
|
|
Q: I would like to help. What can I do?
|
|
|
|
A: See the TODO file. Please contact Lasse Collin before starting to do
|
|
anything, because it is possible that someone else is already working
|
|
on the same thing.
|
|
|
|
|
|
Q: How can I contact the authors?
|
|
|
|
A: Lasse Collin is the maintainer of LZMA Utils. You can contact him
|
|
either via IRC (Larhzu on #tukaani at Freenode or IRCnet). Email
|
|
should work too, <lasse.collin@tukaani.org>.
|
|
|
|
Igor Pavlov is the father of LZMA. He is the author of 7-Zip
|
|
and LZMA SDK. <http://7-zip.org/>
|
|
|
|
NOTE: Please don't bother Igor Pavlov with questions specific
|
|
to LZMA Utils.
|
|
|