153 lines
6.0 KiB
Plaintext
153 lines
6.0 KiB
Plaintext
|
|
.lzma Test Files
|
|
----------------
|
|
|
|
0. Introduction
|
|
|
|
This directory contains bunch of files to test handling of .lzma files
|
|
in .lzma decoder implementations. Many of the files have been created
|
|
by hand with a hex editor, thus there is no better "source code" than
|
|
the files themselves. All the test files (*.lzma) and this README have
|
|
been put into the public domain.
|
|
|
|
|
|
1. File Types
|
|
|
|
Good files (good-*.lzma) must decode successfully without requiring
|
|
a lot of CPU time or RAM. If the decoder supports only Single-Block
|
|
Streams, then good-multi-*.lzma won't decode, of course.
|
|
|
|
Bad files (bad-*.lzma) must cause the decoder to give an error. Like
|
|
with the good files, these files must not require a lot of CPU time
|
|
or RAM before they get detected to be broken.
|
|
|
|
Malicious files (malicious-*.lzma) are good in terms of the file format
|
|
specification, but try to trigger excessive CPU, RAM or disk usage in
|
|
the decoder. To prevent malicious files from putting the decoder in
|
|
inifinite loop (*), eating all available RAM or disk space, decoders
|
|
should have internal limitters that catch these situations.
|
|
|
|
(*) Strictly speaking not infinite, but if decoding of a small file
|
|
would take a few weeks or even years, it's an infinite loop in
|
|
practice.
|
|
|
|
|
|
2. Descriptions of Individual Files
|
|
|
|
2.1. Good Files
|
|
|
|
good-single-none.lzma uses implicit Copy filter with known Uncompressed
|
|
Size.
|
|
|
|
good-single-none-pad.lzma is good-single-none.lzma with Footer Padding.
|
|
|
|
good-cat-single-none-pad.lzma is two good-single-none-pad.lzma files
|
|
concatenated as is. Fully decoding this file requires that the decoder
|
|
supports decoding concatenated files.
|
|
|
|
good-single-subblock_implicit.lzma uses implicit Subblock filter.
|
|
|
|
good-single-lzma.lzma is LZMA compressed file with EOPM.
|
|
|
|
good-single-subblock-lzma.lzma has basic combination of Subblock and
|
|
LZMA filters.
|
|
|
|
good-single-none-empty_1.lzma is an empty file with implicit Copy
|
|
filter and no integrity Check.
|
|
|
|
good-single-none-empty_2.lzma is an empty file with implicit Copy
|
|
filter and CRC32 as Check.
|
|
|
|
good-single-none-empty_3.lzma is an empty file with implicit Copy
|
|
filter, known Compressed Size, and no integrity Check.
|
|
|
|
good-single-lzma-empty.lzma is an empty file with LZMA filter and no
|
|
integrity Check.
|
|
|
|
good-single-subblock_rle.lzma takes advantage of Subblock filter's
|
|
run-length encoding.
|
|
|
|
good-single-delta-lzma.tiff.lzma is an image file that compresses
|
|
better with Delta+LZMA than with plain LZMA.
|
|
|
|
good-single-lzma-flush_1.lzma has a flush marker in the middle of
|
|
the file, and no EOPM.
|
|
|
|
good-single-lzma-flush_2.lzma has a flush marker in the middle of
|
|
the file and just before EOPM.
|
|
|
|
|
|
2.2. Bad Files
|
|
|
|
bad-single-none-truncated.lzma is good-single-none.lzma without the
|
|
last byte of the file.
|
|
|
|
bad-cat-single-none-pad_garbage_1.lzma is good-cat-single-none-pad.lzma
|
|
with 0xFE appended to the end of the file. 0xFE doesn't begin .lzma
|
|
or LZMA_Alone format file.
|
|
|
|
bad-cat-single-none-pad_garbage_2.lzma is good-cat-single-none-pad.lzma
|
|
with 0xFF appended to the end of the file. 0xFF begins .lzma format
|
|
file, thus the decoder has to detect that the file is incomplete.
|
|
|
|
bad-cat-single-none-pad_garbage_3.lzma is good-cat-single-none-pad.lzma
|
|
with 0x5D appended to the end of the file. 0x5D is the most common
|
|
first byte of LZMA_Alone format file.
|
|
|
|
bad-single-none-empty.lzma is like good-single-none-empty_3.lzma but
|
|
with non-zero value in the Compressed Size field.
|
|
|
|
bad-single-data_after_eopm_1.lzma has LZMA+Subblock, where the Subblock
|
|
filter gives one byte of data to LZMA after LZMA has detected EOPM.
|
|
|
|
bad-single-data_after_eopm_2.lzma is like
|
|
bad-single-data_after_eopm_1.lzma but Subblock gives 256 MiB of data
|
|
to LZMA after LZMA has detected EOPM.
|
|
|
|
bad-single-subblock_subblock.lzma has Subblock+Subblock, where the
|
|
Subblock decoder is given End of Input in the middle of a Subblock.
|
|
|
|
bad-single-subblock-padding_loop.lzma contains huge amount of
|
|
consecutive Padding bytes, which isn't allowed by the Subblock filter
|
|
format. If it were allowed, this file would hang the decoder for very
|
|
long time (weeks to years).
|
|
|
|
bad-single-subblock1023-slow.lzma is similar to
|
|
malicious-single-subblock31-slow.lzma except that this uses 1023 bytes
|
|
of Padding in every place instead of 31 bytes. The Subblock filter
|
|
format specification allows only 31-byte Padings, thus this file must
|
|
get detected as bad without producing any output. Allowing larger
|
|
Padding than 31 bytes was considered (so this test file was created),
|
|
but it seemed to be a bad idea since it would increase worst-case CPU
|
|
usage.
|
|
|
|
bad-single-lzma-flush_beginning.lzma has flush marker in the beginning
|
|
of the LZMA data.
|
|
|
|
bad-single-lzma-flush_twice.lzma has two flush markers with no data
|
|
between them.
|
|
|
|
|
|
2.3. Malicious Files
|
|
|
|
malicious-single-subblock31-slow.lzma requires quite a bit of CPU time
|
|
per decoded byte. It contains LZMA compressed Subblock filter data that
|
|
has as much Padding as the specification allows. LZMA is also used as
|
|
a Subfilter, to further slowdown the decoder. Every Subfilter instance
|
|
produces only one byte of output. If you can create a file that wastes
|
|
notably more CPU cycles than this file, please contact Lasse Collin.
|
|
|
|
malicious-single-subblock-256MiB.lzma is a tiny file that produces
|
|
256 MiB of output. It uses Subblock filter's run-length encoding
|
|
to achieve this.
|
|
|
|
malicious-single-subblock-64PiB.lzma is a tiny file that produces
|
|
64 PiB of output (if you have patience to wait). This is done by
|
|
chaining two Subblock filters and using their run-length encoders.
|
|
|
|
malicious-multi-metadata-64PiB.lzma is like
|
|
malicious-single-subblock-64PiB.lzma but the huge amount of output
|
|
is in a Metadata Block. Trying to decode this file may take years
|
|
unless the decoder catches that the Metadata has unreasonable size.
|
|
|