xz-analysis-mirror/doc/faq.txt


XZ Utils FAQ
============

Q:  What do the letters XZ mean?

A:  Nothing. They are just two letters, which come from the file format
    suffix .xz. The .xz suffix was selected, because it seemed to be
    pretty much unused. It has no deeper meaning.


Q:  What are LZMA and LZMA2?

A:  LZMA stands for Lempel-Ziv-Markov chain-Algorithm. It is the name
    of the compression algorithm designed by Igor Pavlov for 7-Zip.
    LZMA is based on LZ77 and range encoding.

    LZMA2 is an updated version of the original LZMA to fix a couple of
    practical issues. In context of XZ Utils, LZMA is called LZMA1 to
    emphasize that LZMA is not the same thing as LZMA2. LZMA2 is the
    primary compression algorithm in the .xz file format.


Q:  There are many LZMA related projects. How does XZ Utils relate to them?

A:  7-Zip and LZMA SDK are the original projects. LZMA SDK is roughly
    a subset of the 7-Zip source tree.

    p7zip is 7-Zip's command line tools ported to POSIX-like systems.

    LZMA Utils provide a gzip-like lzma tool for POSIX-like systems.
    LZMA Utils are based on LZMA SDK. XZ Utils are the successor to
    LZMA Utils.

    There are several other projects using LZMA. Most are more or less
    based on LZMA SDK. See <http://7-zip.org/links.html>.


Q:  Why is liblzma named liblzma if its primary file format is .xz?
    Shouldn't it be e.g. libxz?

A:  When the designing of the .xz format began, the idea was to replace
    the .lzma format and use the same .lzma suffix. It would have been
    quite OK to reuse the suffix when there were very few .lzma files
    around. However, the old .lzma format become popular before the
    new format was finished. The new format was renamed to .xz but the
    name of liblzma wasn't changed.


Q:  Do XZ Utils support the .7z format?

A:  No. Use 7-Zip (Windows) or p7zip (POSIX-like systems) to handle .7z
    files.


Q:  I have many .tar.7z files. Can I convert them to .tar.xz without
    spending hours recompressing the data?

A:  In the "extra" directory, there is a script named 7z2lzma.bash which
    is able to convert some .7z files to the .lzma format (not .xz). It
    needs the 7za (or 7z) command from p7zip. The script may silently
    produce corrupt output if certain assumptions are not met, so
    decompress the resulting .lzma file and compare it against the
    original before deleting the original file!


Q:  I have many .lzma files. Can I quickly convert them to the .xz format?

A:  For now, no. Since XZ Utils supports the .lzma format, it's usually
    not too bad to keep the old files in the old format. If you want to
    do the conversion anyway, you need to decompress the .lzma files and
    then recompress to the .xz format.

    Technically, there is a way to make the conversion relatively fast
    (roughly twice the time that normal decompression takes). Writing
    such a tool would take quite a bit time though, and would probably
    be useful to only a few people. If you really want such a conversion
    tool, contact Lasse Collin and offer some money.


Q:  I have installed xz, but my tar doesn't recognize .tar.xz files.
    How can I extract .tar.xz files?

A:  xz -dc foo.tar.xz | tar xf -


Q:  Can I recover parts of a broken .xz file (e.g. corrupted CD-R)?

A:  It may be possible if the file consists of multiple blocks, which
    typically is not the case if the file was created in single-threaded
    mode. There is no recovery program yet.


Q:  Is (some part of) XZ Utils patented?

A:  Lasse Collin is not aware of any patents that could affect XZ Utils.
    However, due to nature of software patents, it's not possible to
    guarantee that XZ Utils isn't affected by any third party patent(s).


Q:  Where can I find documentation about the file format and algorithms?

A:  The .xz format is documented in xz-file-format.txt. It is a container
    format only, and doesn't include descriptions of any non-trivial
    filters.

    Documenting LZMA and LZMA2 is planned, but for now, there is no other
    documentation that the source code. Before you begin, you should know
    the basics of LZ77 and range coding algorithms. LZMA is based on LZ77,
    but LZMA is a lot more complex. Range coding is used to compress
    the final bitstream like Huffman coding is used in Deflate.


Q:  I cannot find BCJ and BCJ2 filters. Don't they exist in liblzma?

A:  BCJ filter is called "x86" in liblzma. BCJ2 is not included,
    because it requires using more than one encoded output stream.
    A streamable version of BCJ2-style filtering is planned.


Q:  I need to use a script that runs "xz -9". On a system with 256 MiB
    of RAM, xz says that it cannot allocate memory. Can I make the
    script work without modifying it?

A:  Set a default memory usage limit for compression. You can do it e.g.
    in a shell initialization script such as ~/.bashrc or /etc/profile:

        XZ_DEFAULTS=--memlimit-compress=150MiB
        export XZ_DEFAULTS

    xz will then scale the compression settings down so that the given
    memory usage limit is not reached. This way xz shouldn't run out
    of memory.

    Check also that memory-related resource limits are high enough.
    On most systems, "ulimit -a" will show the current resource limits.


Q:  How do I create files that can be decompressed with XZ Embedded?

A:  See the documentation in XZ Embedded. In short, something like
    this is a good start:

        xz --check=crc32 --lzma2=preset=6e,dict=64KiB

    Or if a BCJ filter is needed too, e.g. if compressing
    a kernel image for PowerPC:

        xz --check=crc32 --powerpc --lzma2=preset=6e,dict=64KiB

    Adjust dictionary size to get a good compromise between
    compression ratio and decompressor memory usage. Note that
    in single-call decompression mode of XZ Embedded, a big
    dictionary doesn't increase memory usage.


Q:  Will xz support threaded compression?

A:  It is planned and has been taken into account when designing
    the .xz file format. Eventually there will probably be three types
    of threading, each method having its own advantages and disadvantages.

    The simplest method is splitting the uncompressed data into blocks
    and compressing them in parallel independent from each other.
    Since the blocks are compressed independently, they can also be
    decompressed independently. Together with the index feature in .xz,
    this allows using threads to create .xz files for random-access
    reading. This also makes threaded decompression possible, although
    it is not clear if threaded decompression will ever be implemented.

    The independent blocks method has a couple of disadvantages too. It
    will compress worse than a single-block method. Often the difference
    is not too big (maybe 1-2 %) but sometimes it can be too big. Also,
    the memory usage of the compressor increases linearly when adding
    threads.

    Match finder parallelization is another threading method. It has
    been in 7-Zip for ages. It doesn't affect compression ratio or
    memory usage significantly. Among the three threading methods, only
    this is useful when compressing small files (files that are not
    significantly bigger than the dictionary). Unfortunately this method
    scales only to about two CPU cores.

    The third method is pigz-style threading (I use that name, because
    pigz <http://www.zlib.net/pigz/> uses that method). It doesn't
    affect compression ratio significantly and scales to many cores.
    The memory usage scales linearly when threads are added. It isn't
    significant with pigz, because Deflate uses only 32 KiB dictionary,
    but with LZMA2 the memory usage will increase dramatically just like
    with the independent blocks method. There is also a constant
    computational overhead, which may make pigz-method a bit dull on
    dual-core compared to the parallel match finder method, but with more
    cores the overhead is not a big deal anymore.

    Combining the threading methods will be possible and also useful.
    E.g. combining match finder parallelization with pigz-style threading
    can cut the memory usage by 50 %.

    It is possible that the single-threaded method will be modified to
    create files indentical to the pigz-style method. We'll see once
    pigz-style threading has been implemented in liblzma.


Q:  How do I build a program that needs liblzmadec (lzmadec.h)?

A:  liblzmadec is part of LZMA Utils. XZ Utils has liblzma, but no
    liblzmadec. The code using liblzmadec should be ported to use
    liblzma instead. If you cannot or don't want to do that, download
    LZMA Utils from <http://tukaani.org/lzma/>.


Q:  The default build of liblzma is too big. How can I make it smaller?

A:  Give --enable-small to the configure script. Use also appropriate
    --enable or --disable options to include only those filter encoders
    and decoders and integrity checks that you actually need. Use
    CFLAGS=-Os (with GCC) or equivalent to tell your compiler to optimize
    for size. See INSTALL for information about configure options.

    If the result is still too big, take a look at XZ Embedded. It is
    a separate project, which provides a limited but significantly
    smaller XZ decoder implementation than XZ Utils. You can find it
    at <http://tukaani.org/xz/embedded.html>.
Imported to git. 2007-12-08 17:42:33 -05:00
Put the interesting parts of XZ Utils into the public domain. Some minor documentation cleanups were made at the same time. 2009-04-13 04:27:40 -04:00			`XZ Utils FAQ`
			`============`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`Q: What do the letters XZ mean?`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`A: Nothing. They are just two letters, which come from the file format`
			`suffix .xz. The .xz suffix was selected, because it seemed to be`
Update the FAQ. 2010-10-02 05:07:33 -04:00			`pretty much unused. It has no deeper meaning.`
Imported to git. 2007-12-08 17:42:33 -05:00

Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`Q: What are LZMA and LZMA2?`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`A: LZMA stands for Lempel-Ziv-Markov chain-Algorithm. It is the name`
			`of the compression algorithm designed by Igor Pavlov for 7-Zip.`
			`LZMA is based on LZ77 and range encoding.`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`LZMA2 is an updated version of the original LZMA to fix a couple of`
			`practical issues. In context of XZ Utils, LZMA is called LZMA1 to`
			`emphasize that LZMA is not the same thing as LZMA2. LZMA2 is the`
			`primary compression algorithm in the .xz file format.`
Imported to git. 2007-12-08 17:42:33 -05:00

Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`Q: There are many LZMA related projects. How does XZ Utils relate to them?`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`A: 7-Zip and LZMA SDK are the original projects. LZMA SDK is roughly`
			`a subset of the 7-Zip source tree.`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`p7zip is 7-Zip's command line tools ported to POSIX-like systems.`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`LZMA Utils provide a gzip-like lzma tool for POSIX-like systems.`
			`LZMA Utils are based on LZMA SDK. XZ Utils are the successor to`
			`LZMA Utils.`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`There are several other projects using LZMA. Most are more or less`
Update the FAQ. 2010-10-02 05:07:33 -04:00			`based on LZMA SDK. See <http://7-zip.org/links.html>.`


			`Q: Why is liblzma named liblzma if its primary file format is .xz?`
			`Shouldn't it be e.g. libxz?`

			`A: When the designing of the .xz format began, the idea was to replace`
			`the .lzma format and use the same .lzma suffix. It would have been`
			`quite OK to reuse the suffix when there were very few .lzma files`
			`around. However, the old .lzma format become popular before the`
			`new format was finished. The new format was renamed to .xz but the`
			`name of liblzma wasn't changed.`
Imported to git. 2007-12-08 17:42:33 -05:00

Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`Q: Do XZ Utils support the .7z format?`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`A: No. Use 7-Zip (Windows) or p7zip (POSIX-like systems) to handle .7z`
			`files.`
Imported to git. 2007-12-08 17:42:33 -05:00

Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`Q: I have many .tar.7z files. Can I convert them to .tar.xz without`
			`spending hours recompressing the data?`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`A: In the "extra" directory, there is a script named 7z2lzma.bash which`
			`is able to convert some .7z files to the .lzma format (not .xz). It`
			`needs the 7za (or 7z) command from p7zip. The script may silently`
			`produce corrupt output if certain assumptions are not met, so`
			`decompress the resulting .lzma file and compare it against the`
			`original before deleting the original file!`
Imported to git. 2007-12-08 17:42:33 -05:00

Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`Q: I have many .lzma files. Can I quickly convert them to the .xz format?`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`A: For now, no. Since XZ Utils supports the .lzma format, it's usually`
			`not too bad to keep the old files in the old format. If you want to`
			`do the conversion anyway, you need to decompress the .lzma files and`
			`then recompress to the .xz format.`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`Technically, there is a way to make the conversion relatively fast`
			`(roughly twice the time that normal decompression takes). Writing`
			`such a tool would take quite a bit time though, and would probably`
			`be useful to only a few people. If you really want such a conversion`
			`tool, contact Lasse Collin and offer some money.`
Imported to git. 2007-12-08 17:42:33 -05:00

Add a simple tip to faq.txt about tar and xz. Thanks to Gilles Espinasse. 2010-03-31 09:47:25 -04:00			`Q: I have installed xz, but my tar doesn't recognize .tar.xz files.`
			`How can I extract .tar.xz files?`

			`A: xz -dc foo.tar.xz \| tar xf -`


Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`Q: Can I recover parts of a broken .xz file (e.g. corrupted CD-R)?`
Imported to git. 2007-12-08 17:42:33 -05:00
Fix a typo in FAQ. Thanks to Jim Meyering. (From now on, I try to always remember to put the relevant thanks to commit messages.) 2009-08-27 03:21:18 -04:00			`A: It may be possible if the file consists of multiple blocks, which`
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`typically is not the case if the file was created in single-threaded`
			`mode. There is no recovery program yet.`
Imported to git. 2007-12-08 17:42:33 -05:00

Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`Q: Is (some part of) XZ Utils patented?`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`A: Lasse Collin is not aware of any patents that could affect XZ Utils.`
			`However, due to nature of software patents, it's not possible to`
			`guarantee that XZ Utils isn't affected by any third party patent(s).`
Imported to git. 2007-12-08 17:42:33 -05:00

Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`Q: Where can I find documentation about the file format and algorithms?`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`A: The .xz format is documented in xz-file-format.txt. It is a container`
			`format only, and doesn't include descriptions of any non-trivial`
			`filters.`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`Documenting LZMA and LZMA2 is planned, but for now, there is no other`
			`documentation that the source code. Before you begin, you should know`
			`the basics of LZ77 and range coding algorithms. LZMA is based on LZ77,`
Update the FAQ. 2010-10-02 05:07:33 -04:00			`but LZMA is a lot more complex. Range coding is used to compress`
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`the final bitstream like Huffman coding is used in Deflate.`
Imported to git. 2007-12-08 17:42:33 -05:00

			`Q: I cannot find BCJ and BCJ2 filters. Don't they exist in liblzma?`

			`A: BCJ filter is called "x86" in liblzma. BCJ2 is not included,`
			`because it requires using more than one encoded output stream.`
Update the FAQ. 2010-10-02 05:07:33 -04:00			`A streamable version of BCJ2-style filtering is planned.`


			`Q: I need to use a script that runs "xz -9". On a system with 256 MiB`
			`of RAM, xz says that it cannot allocate memory. Can I make the`
			`script work without modifying it?`

			`A: Set a default memory usage limit for compression. You can do it e.g.`
			`in a shell initialization script such as ~/.bashrc or /etc/profile:`

			`XZ_DEFAULTS=--memlimit-compress=150MiB`
			`export XZ_DEFAULTS`

			`xz will then scale the compression settings down so that the given`
			`memory usage limit is not reached. This way xz shouldn't run out`
			`of memory.`

			`Check also that memory-related resource limits are high enough.`
			`On most systems, "ulimit -a" will show the current resource limits.`


			`Q: How do I create files that can be decompressed with XZ Embedded?`

			`A: See the documentation in XZ Embedded. In short, something like`
			`this is a good start:`

			`xz --check=crc32 --lzma2=preset=6e,dict=64KiB`

			`Or if a BCJ filter is needed too, e.g. if compressing`
			`a kernel image for PowerPC:`

			`xz --check=crc32 --powerpc --lzma2=preset=6e,dict=64KiB`

			`Adjust dictionary size to get a good compromise between`
			`compression ratio and decompressor memory usage. Note that`
			`in single-call decompression mode of XZ Embedded, a big`
			`dictionary doesn't increase memory usage.`


			`Q: Will xz support threaded compression?`

			`A: It is planned and has been taken into account when designing`
			`the .xz file format. Eventually there will probably be three types`
			`of threading, each method having its own advantages and disadvantages.`

			`The simplest method is splitting the uncompressed data into blocks`
			`and compressing them in parallel independent from each other.`
			`Since the blocks are compressed independently, they can also be`
			`decompressed independently. Together with the index feature in .xz,`
			`this allows using threads to create .xz files for random-access`
			`reading. This also makes threaded decompression possible, although`
			`it is not clear if threaded decompression will ever be implemented.`

			`The independent blocks method has a couple of disadvantages too. It`
			`will compress worse than a single-block method. Often the difference`
			`is not too big (maybe 1-2 %) but sometimes it can be too big. Also,`
			`the memory usage of the compressor increases linearly when adding`
			`threads.`

			`Match finder parallelization is another threading method. It has`
			`been in 7-Zip for ages. It doesn't affect compression ratio or`
			`memory usage significantly. Among the three threading methods, only`
			`this is useful when compressing small files (files that are not`
			`significantly bigger than the dictionary). Unfortunately this method`
			`scales only to about two CPU cores.`

			`The third method is pigz-style threading (I use that name, because`
			`pigz <http://www.zlib.net/pigz/> uses that method). It doesn't`
			`affect compression ratio significantly and scales to many cores.`
			`The memory usage scales linearly when threads are added. It isn't`
			`significant with pigz, because Deflate uses only 32 KiB dictionary,`
			`but with LZMA2 the memory usage will increase dramatically just like`
			`with the independent blocks method. There is also a constant`
			`computational overhead, which may make pigz-method a bit dull on`
			`dual-core compared to the parallel match finder method, but with more`
			`cores the overhead is not a big deal anymore.`

			`Combining the threading methods will be possible and also useful.`
			`E.g. combining match finder parallelization with pigz-style threading`
			`can cut the memory usage by 50 %.`

			`It is possible that the single-threaded method will be modified to`
			`create files indentical to the pigz-style method. We'll see once`
			`pigz-style threading has been implemented in liblzma.`
Imported to git. 2007-12-08 17:42:33 -05:00

Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`Q: How do I build a program that needs liblzmadec (lzmadec.h)?`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`A: liblzmadec is part of LZMA Utils. XZ Utils has liblzma, but no`
			`liblzmadec. The code using liblzmadec should be ported to use`
			`liblzma instead. If you cannot or don't want to do that, download`
			`LZMA Utils from <http://tukaani.org/lzma/>.`
Imported to git. 2007-12-08 17:42:33 -05:00

Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`Q: The default build of liblzma is too big. How can I make it smaller?`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`A: Give --enable-small to the configure script. Use also appropriate`
			`--enable or --disable options to include only those filter encoders`
			`and decoders and integrity checks that you actually need. Use`
			`CFLAGS=-Os (with GCC) or equivalent to tell your compiler to optimize`
			`for size. See INSTALL for information about configure options.`
Imported to git. 2007-12-08 17:42:33 -05:00
Updated faq.txt. Some questions worth answering were removed, because I currently don't have good up to date answers to them. 2009-08-17 17:26:48 -04:00			`If the result is still too big, take a look at XZ Embedded. It is`
Collection of language fixes to comments and docs. Thanks to Jonathan Nieder. 2010-02-12 06:16:15 -05:00			`a separate project, which provides a limited but significantly`
Update the FAQ. 2010-10-02 05:07:33 -04:00			`smaller XZ decoder implementation than XZ Utils. You can find it`
			`at <http://tukaani.org/xz/embedded.html>.`
Imported to git. 2007-12-08 17:42:33 -05:00