This does the previous commit with CMake.
AC_EGREP_CPP uses AC_REQUIRE so the outermost if-commands must
be changed to AS_IF to ensure that things wont break some day.
See 5a5bd7f871.
It doesn't support the __symver__ attribute or __asm__(".symver ...").
The generic symbol versioning can still be used since it only needs
linker support.
NVHPC compiler has several issues that make it impossible to
build liblzma:
- the compiler cannot handle unions that contain pointers that
are not the first members;
- the compiler cannot handle the assembler code in range_decoder.h
(LZMA_RANGE_DECODER_CONFIG has to be set to zero);
- the compiler fails to produce valid code for delta_decode if the
vectorization is enabled, which results in failed tests.
This introduces NVHPC-specific workarounds that address the issues.
There are cases when the users want to decide themselves whether
they want to have the generic (even on GNU/Linux) or the linux
(even if we do not recommend that) symbol versioning variant.
The former might be needed to circumvent compiler issues (i.e.
the compiler does not support all features that are required
for the linux versioning), the latter might help in overriding
the assumptions made in the configure script.
The original files were generated with random local to my machine.
To better reproduce these files in the future, a constant seed was used
to recreate these files.
With GCC and a certain combination of flags, Valgrind will falsely
trigger an invalid write. This appears to be due to the omission of
instructions to properly save, set up, and restore the frame pointer.
The IFUNC resolver is a leaf function since it only calls a function
that is inlined. So sometimes GCC omits the frame pointer instructions
in the resolver unless this optimization is explictly disabled.
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=2267598.
Using __attribute__((__no_profile_instrument_function__)) on the ifunc
resolver works around a bug in GCC -fprofile-generate:
it adds profiling code even to ifunc resolvers which can make
the ifunc resolver crash at program startup. This attribute
was not introduced until GCC 7 and Clang 13, so ifunc won't
be used with prior versions of these compilers.
This bug was brought to our attention by:
https://bugs.gentoo.org/925415
And was reported to upstream GCC by:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11411
Now that multi threaded encoding is the default, users do not need to
see a warning message everytime the number of threads is reduced. On
some machines, this could happen very often. It is not unreasonable for
users to need to set double verbose mode to see this kind of
information.
To see these warning messages -vv or --verbose --verbose must be passed
to set xz into the highest possible verbosity mode.
These warnings had caused automated testing frameworks to fail when they
expected no output to stderr.
Thanks to Sebastian Andrzej Siewior for reporting this and for the
initial version of the patch.
The previous Linux Landlock feature test assumed that having the
linux/landlock.h header file was enough. The new feature tests also
requires that prctl() and the required Landlock system calls are
supported.
Like 5.5.0alpha, 5.7.0alpha won't be released, it's just to mark that
the branch is not stable.
Once again there is no API/ABI stability for new features in devel
versions. The major soname won't be bumped even if API/ABI of new
features breaks between devel releases.
The French and Brazilian Portuguese man page translations have not been
updated since the switch from public domain to 0BSD. The old GPLv2
strings have now been removed from these files.
Old versions of Clang reported the unsupported function attribute and
__crc32d() function as warnings instead of errors, so the feature test
passed when it shouldn't have, causing a compile error at build time.
-Werror was added to this feature test to fix this. The change is not
needed for CMake because check_c_source_compiles() also performs
linking and the error is caught then.
Thanks to Sebastian Andrzej Siewior for reporting this.
If xz is given a directory, it should look like this:
$ xz /usr/bin
xz: /usr/bin: Is a directory, skipping
The Landlock rules didn't allow opening directories for reading:
$ xz /usr/bin
xz: /usr/bin: Permission denied
The simplest fix was to allow opening directories for reading.
While it's a bit silly to allow it solely for the error message,
it shouldn't make the sandbox significantly weaker.
The single-file use case (like when called from GNU tar) is
still as strict as possible: all Landlock restrictions are
enabled before (de)compression starts.
Support for the old MinGW was dropped. Only MinGW-w64 with GCC
is supported now.
The script now supports also cross-compilation from GNU/Linux
(tests are not run). MSYS2 and also the old MSYS 1.0.11 work
for building on Windows. The i686 and x86_64 toolchains must
be in PATH to build both 32-bit and 64-bit versions.
Parallel builds are done if "nproc" from GNU coreutils is available.
MinGW-w64 runtime copyright information file was renamed from
COPYING-Windows.txt to COPYING.MinGW-w64-runtime.txt which
is the filename used by MinGW-w64 itself. Its existence
is now mandatory, it's checked at the beginning of the script.
The file TODO is no longer copied to the package.
The original code was good enough for supporting GNU/Linux
and a few others but it wasn't very portable.
CMake doesn't support Solaris Studio's -xldscope=hidden.
If it ever does, things should still work with this commit
as Solaris Studio supports not only its own __global but also
the GNU C __attribute__((visibility("default"))). Support for the
attribute was added in 2007 to Sun Studio 12 compiler version 5.9.
-O3 doesn't seem useful for speed but it makes the code bigger.
CMake makes is difficult for users to simply override the
optimization level: CFLAGS / CMAKE_C_FLAGS aren't helpful because
they go before CMAKE_C_FLAGS_RELEASE. Of course, users can override
CMAKE_C_FLAGS_RELEASE directly but then they have to remember to
add also -DNDEBUG to disable assertions.
This commit changes -O3 to -O2 in CMAKE_C_FLAGS_RELEASE if and only if
CMAKE_C_FLAGS_RELEASE cache variable doesn't already exist. So if
a custom value is passed on the command line (or reconfiguring an
already-configured build), the cache variable won't be modified.
In contrast to Automake, skipping of this test when decoders
are disabled is handled at CMake side instead of test_scripts.sh
because CMake-build doesn't create config.h.
Compared to the Autotools-based build, this has simpler handling
for the shell (@POSIX_SHELL@) and extra PATH entry for the scripts
(configure has --enable-path-for-scripts=PREFIX). The simpler
metho should be enough for non-ancient systems and Solaris.
It helps that cmake_install.cmake doesn't parallelize installation
so symlinks can be created so that the target is always known to
exist (a requirement on Windows in some cases).
This bumps the minimum CMake version from 3.13 to 3.14 to use
file(CREATE_LINK ...). It could be made to work on 3.13 by
calling "cmake -E create_symlink" but it's uglier code and
slower in "make install". 3.14 should be a reasonable version
to require nowadays, especially since the Autotools build
is still the primary build system for most OSes.
If gettext tools are available, the .po files listed in po/LINGUAS
are converted using msgfmt. This allows building with translations
directly from xz.git without Autotools.
If gettext tools aren't available, the Autotools-created .gmo files
in the "po" directory will be used. This allows CMake-based build
to use translations from Autotools-generated tarball.
If translation support is found (Intl_FOUND) but both the
gettext tools and the pre-generated .gmo files are missing,
then "make" will fail.
This makes these sandboxing methods stricter when no files are
created or deleted. That is, it's a middle ground between the
initial sandbox and the strictest single-file-to-stdout sandbox:
this allows opening files for reading but output has to go to stdout.
Linux 6.7 added support for ABI version 4 which restricts
TCP connections which xz won't need and thus those can be
forbidden now. Since the ABI version is handled at runtime,
supporting version 4 won't cause any compatibility issues.
Note that new enough kernel headers are required to get
version 4 support enabled at build time.
Landlock is now always used just like pledge(2) is: first in more
permissive mode and later (under certain common conditions) in
a strict mode that doesn't allow opening more files.
I put pledge(2) first in sandbox.c because it's the simplest API
to use and still somewhat fine-grained for basic applications.
So it's the simplest thing to understand for anyone reading sandbox.c.
Also explicitly initialize progress_automatic to make it clear
that it can be read before message_init() sets it. Static variable
was initialized to false by default already so this is only for
clarity.
GCC docs promise that it works and a few other compilers do
too. Clang/LLVM is documented source code only but unsurprisingly
it behaves the same as others on x86-64 at least. But the
certainly-portable way is good enough here so use that.
The x32 port has a x86-64 ABI in term of all registers but uses only
32bit pointer like x86-32. The assembly optimisation fails to compile on
x32. Given the state of x32 I suggest to exclude it from the
optimisation rather than trying to fix it.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
It's used only for basic bittrees and fixed-size reverse bittree
because those showed a clear benefit on x86-64 with GCC and Clang.
The other methods were more mixed and thus are commented out but
they should be tested on other archs.
Now extra buffer space is reserved so that repeating bytes for
any single match will never need to copy from two places (both
the beginning and the end of the buffer). This simplifies
dict_repeat() and helps a little with speed.
This seems to reduce .lzma decompression time about 2 %, so
with .xz and CRC it could be slightly less. The small things
add up still.
It's not completely obvious if this is better in the decoder.
It should be good if compiler can avoid creating a branch
(like using CMOV on x86).
This also makes lzma_encoder.c use the new macros.
The new decoder resumes the first decoder loop in the Resumable mode.
Then, the code executes in Non-resumable mode until it detects that it
cannot guarantee to have enough input/output to decode another symbol.
The Resumable mode is how the decoder has always worked. Before decoding
every input bit, it checks if there is enough space and will save its
location to be resumed later. When the decoder has more input/output,
it jumps back to the correct sequence in the Resumable mode code.
When the input/output buffers are large, the Resumable mode is much
slower than the Non-resumable because it has more branches and is harder
for the compiler to optimize since it is in a large switch block.
Early benchmarking shows significant time improvement (8-10% on gcc and
clang x86) by using the Non-resumable code as much as possible.
The new "safe" range decoder mode is the same as old range decoder, but
now the default behavior of the range decoder will not check if there is
enough input or output to complete the operation. When the buffers are
close to fully consumed, the "safe" operations must be used instead. This
will improve speed because it will reduce the number of branches needed
for most of the range decoder operations.
The footer template from Doxygen has the closing </body> </html>
as Doxygen doesn't add them otherwise.
target="_blank" was omitted as it's not useful here but
it can be slightly annoying as one cannot just go back
in the browser history.
Since the footer links to the license file in the same
directory and not to CC website, the rel attributes
can be omitted.
lzmainfo has had translation support since 2009 at least but
it was never added to po/POTFILES.in so the messages weren't
translated. It's a very rarely needed tool so it's not too bad.
This also adds src/xz/mytime.c to po/POTFILES.in although there
are no translatable strings. It's simpler this way so that it
won't be forgotten if strings were ever added to that file.
The main reason is a kind of silly one:
xz-man.pot contains strings from all man pages in XZ Utils.
The man pages of xzdiff, xzgrep, and xzmore were under GPLv2
and the rest under 0BSD. Thus xz-man.pot contained strings
under two licences. po4a creates the translated man pages
from the combined 0BSD+GPLv2 xz-man.pot.
I haven't liked this mixing in xz-man.pot but the
Translation Project requires that all man pages must be
in the same .pot file. So a separate xz-man-gpl.pot
wasn't an option.
Since these man pages are short, rewriting them was quick enough.
Now xz-man.pot is entirely under 0BSD and marking the per-file
licenses is simpler.
As a bonus, some wording hopefully is now slightly better
although it's perhaps a matter of taste.
NOTE: In xzgrep.1, the EXIT STATUS section was written by me
in the commit d796b6d7fd so that's
why that section could be taken as is from the old xzgrep.1.
Perhaps the generated files aren't even copyrightable but
using the same license for them as for the rest of the liblzma
keeps things more consistent for tools that look for license info.
The initial commit 5d018dc035
in 2007 had a comment in sha256.c that the code is based on
Crypto++ Library 5.5.1. In 2009 the Authors list in sha256.c
and the AUTHORS file was updated with information that the
code had come from Crypto++ but via 7-Zip. I know I had viewed
7-Zip's SHA-256 code but back then the C code has been identical
enough with Crypto++, so I don't why I thought the author info
would need that extra step via 7-Zip for this single file.
Another error is that I had mixed sha.* and shacal2.* files
when checking for author info in Crypto++. The shacal2.* files
aren't related to liblzma's sha256.c and thus Kevin Springle's
code in Crypto++ isn't either.
If liblzma is configured with --disable-clmul-crc
CFLAGS="-msse4.1 -mpclmul", then it will fail to compile because the
generic version must be used but the CRC tables were not included.
The code was using HAVE_FUNC_ATTRIBUTE_IFUNC instead of CRC_USE_IFUNC.
With ARM64, ifunc is incompatible because it requires non-inline
function calls for runtime detection.
Even though the proper name for the architecture is aarch64, this
project uses ARM64 throughout. So the rename is for consistency.
Additionally, crc32_arm64.h was slightly refactored for the following
changes:
* Added MSVC, FreeBSD, and macOS support in
is_arch_extension_supported().
* crc32_arch_optimized() now checks the size when aligning the
buffer.
* crc32_arch_optimized() loop conditions were slightly modified to
avoid both decrementing the size and incrementing the buffer
pointer.
* Use the intrinsic wrappers defined in <arm_acle.h> because GCC and
Clang name them differently.
* Minor spacing and comment changes.
The CRC_GENERIC is now split into CRC32_GENERIC and CRC64_GENERIC, since
the ARM64 optimizations will be different between CRC32 and CRC64.
For the same reason, CRC_ARCH_OPTIMIZED is split into
CRC32_ARCH_OPTIMIZED and CRC64_ARCH_OPTIMIZED.
ifunc will only be used with x86-64 CLMUL because the runtime detection
methods needed with ARM64 are not compatible with ifunc.
This adds --enable-arm64-crc32/--disable-arm64-crc32 (enabled by
default) for using the ARM64 CRC32 instruction. This can be disabled if
one knows the binary will never need to run on an ARM64 machine
with this instruction extension.
The CRC32 instructions in ARM64 can calculate the CRC32 result
for 8 bytes in a single operation, making the use of ARM64
instructions much faster compared to the general CRC32 algorithm.
Optimized CRC32 will be enabled if ARM64 has CRC extension
running on Linux.
Signed-off-by: Chenxi Mao <chenxi.mao2013@gmail.com>
The PROJECT_LOGO field is now used to include the XZ logo. The footer
of each page now lists the copyright information instead of the default
footer. The license is also copied to statisfy the copyright and so the
link in the documentation can be local.
This hopefully does more good than bad:
+ It's faster by default.
+ Only the threaded compressor creates files that
can be decompressed in threaded mode.
- Compression ratio is worse, usually not too much though.
When it matters, -T1 must be used.
- Memory usage increases.
- Scripts that assume single-threaded mode but don't use -T1 will
possibly use too much resources, for example, if they run
multiple xz processes in parallel to compress multiple files.
- Output from single-threaded and multi-threaded compressors
differ but such changes could happen for other reasons too
(they just haven't happened since 5.0.0).
Not all RISC-V processors support fast unaligned access so
it's better to read only one byte in the main loop. This can
be faster even on x86-64 when compared to reading 32 bits at
a time as half the time the address is only 16-bit aligned.
The downside is larger code size on archs that do support
fast unaligned access.
Version 5.6.0 will be shown, even though upcoming alphas and betas
will be able to support this filter. 5.6.0 looks nicer in the output and
people shouldn't be encouraged to use an unstable version in production
in any way.
These test files achieve 100% code coverage in
src/liblzma/simple/riscv.c. They contain all of the instructions that
should be filtered and a few cases that should not.
The new Filter ID is 0x0B.
Thanks to Chien Wong <m@xv97.com> for the initial version of the Filter,
the xz CLI updates, and the Autotools build system modifications.
Thanks to Igor Pavlov for his many contributions to the design of
the filter.
Now crc_simd_body() in crc_x86_clmul.h is only called once
in a translation unit, we no longer need to be so cautious
about ensuring the always-inline behavior.
CRC_CLMUL was split to CRC_ARCH_OPTIMIZED and CRC_X86_CLMUL.
CRC_ARCH_OPTIMIZED is defined when an arch-optimized version is used.
Currently the x86 CLMUL implementations are the only arch-optimized
versions, and these also use the CRC_x86_CLMUL macro to tell when
crc_x86_clmul.h needs to be included.
is_clmul_supported() was renamed to is_arch_extension_supported().
crc32_clmul() and crc64_clmul() were renamed to
crc32_arch_optimized() and crc64_arch_optimized().
This way the names make sense with arch-specific non-CLMUL
implementations as well.
A CLMUL-only build will have the crcxx_clmul() inlined into
lzma_crcxx(). Previously a jump to the extern lzma_crcxx_clmul()
was needed. Notes about shared liblzma on ELF platforms:
- On platforms that support ifunc and -fvisibility=hidden, this
was silly because CLMUL-only build would have that single extra
jump instruction of extra overhead.
- On platforms that support neither -fvisibility=hidden nor linker
version script (liblzma*.map), jumping to lzma_crcxx_clmul()
would go via PLT so a few more instructions of overhead (still
not a big issue but silly nevertheless).
There was a downside with static liblzma too: if an application only
needs lzma_crc64(), static linking would make the linker include the
CLMUL code for both CRC32 and CRC64 from crc_x86_clmul.o even though
the CRC32 code wouldn't be needed, thus increasing code size of the
executable (assuming that -ffunction-sections isn't used).
Also, now compilers are likely to inline crc_simd_body()
even if they don't support the always_inline attribute
(or MSVC's __forceinline). Quite possibly all compilers
that build the code do support such an attribute. But now
it likely isn't a problem even if the attribute wasn't supported.
Now all x86-specific stuff is in crc_x86_clmul.h. If other archs
The other archs can then have their own headers with their own
is_clmul_supported() and crcxx_clmul().
Another bonus is that the build system doesn't need to care if
crc_clmul.c is needed.
is_clmul_supported() stays as inline function as it's not needed
when doing a CLMUL-only build (avoids a warning about unused function).
It requires fast unaligned access to 64-bit integers
and a fast instruction to count leading zeros in
a 64-bit integer (__builtin_ctzll()). This perhaps
should be enabled on some other archs too.
Thanks to Chenxi Mao for the original patch:
https://github.com/tukaani-project/xz/pull/75 (the first commit)
According to the numbers there, this may improve encoding
speed by about 3-5 %.
This enables the 8-byte method on MSVC ARM64 too which
should work but wasn't tested.
The sandbox is now enabled for xzdec as well, so it no longer belongs
in just the xz section. xz and xzdec are always built, except for older
MSVC versions, so there isn't a need to conditionally show the sandbox
configuration. CMake will do a little unecessary work on older MSVC
versions that can't build xz or xzdec, but this is a very small
downside.
A very strict sandbox is used when the last file is decompressed. The
likely most common use case of xzdec is to decompress a single file.
The Pledge sandbox is applied to the entire process with slightly more
relaxed promises, until the last file is processed.
Thanks to Christian Weisgerber for the initial patch adding Pledge
sandboxing.
This fixes the recent change to lzma_lz_encoder that used memzero
instead of the NULL constant. On some compilers the NULL constant
(always 0) may not equal the NULL pointer (this only needs to guarentee
to not point to valid memory address).
Later code compares the pointers to the NULL pointer so we must
initialize them with the NULL pointer instead of 0 to guarentee
code correctness.
The first member of lzma_lz_encoder doesn't necessarily need to be set
to NULL since it will always be set before anything tries to use it.
However the function pointer members must be set to NULL since other
functions rely on this NULL value to determine if this behavior is
supported or not.
This fixes a somewhat serious bug, where the options_update() and
set_out_limit() function pointers are not set to NULL. This seems to
have been forgotten since these function pointers were added many years
after the original two (code() and end()).
The problem is that by not setting this to NULL we are relying on the
memory allocation to zero things out if lzma_filters_update() is called
on a LZMA1 encoder. The function pointer for set_out_limit() is less
serious because there is not an API function that could call this in an
incorrect way. set_out_limit() is only called by the MicroLZMA encoder,
which must use LZMA1 where set_out_limit() is always set. Its currently
not possible to call set_out_limit() on an LZMA2 encoder at this time.
So calling lzma_filters_update() on an LZMA1 encoder had undefined
behavior since its possible that memory could be manipulated so the
options_update member pointed to a different instruction sequence.
This is unlikely to be a bug in an existing application since it relies
on calling lzma_filters_update() on an LZMA1 encoder in the first place.
For instance, it does not affect xz because lzma_filters_update() can
only be used when encoding to the .xz format.
This is fixed by using memzero() to set all members of lzma_lz_encoder
to NULL after it is allocated. This ensures this mistake will not occur
here in the future if any additional function pointers are added.
lzma_raw_encoder() and lzma_raw_encoder_init() used "options" as the
parameter name instead of "filters" (used by the declaration). "filters"
is more clear since the parameter represents the list of filters passed
to the raw encoder, each of which contains filter options.
lzma_encoder_init() did not check for NULL options, but
lzma2_encoder_init() did. This is more of a code style improvement than
anything else to help make lzma_encoder_init() and lzma2_encoder_init()
more similar.
Since GCC version 10, GCC no longer complains about simple implicit
integer conversions with Arithmetic operators.
For instance:
uint8_t a = 5;
uint32_t b = a + 5;
Give a warning on GCC 9 and earlier but this:
uint8_t a = 5;
uint32_t b = (a + 5) * 2;
Gives a warning with GCC 10+.
Most of these fixes are small typos and tweaks. A few were caused by bad
advice from me. Here is the summary of what is changed:
- Author line edits
- Small comment changes/additions
- Using the return value in the error messages in the fuzz targets'
coder initialization code
- Removed fuzz_encode_stream.options. This set a max length, which may
prevent some worthwhile code paths from being properly exercised.
- Removed the max_len option from fuzz_decode_stream.options for the
same reason as fuzz_encode_stream. The alone decoder fuzz target still
has this restriction.
- Altered the dictionary contents for fuzz_lzma.dict. Instead of keeping
the properties static and varying the dictionary size, the properties
are varied and the dictionary size is kept small. The dictionary size
doesn't have much impact on the code paths but the properties do.
Closes: https://github.com/tukaani-project/xz/pull/73
This fuzz target handles .xz stream encoding. The first byte of input
is used to dynamically set the preset level in order to increase the
fuzz coverage of complex critical code paths.
This fuzz target that handles LZMA alone decoding. A new fuzz
dictionary .dict was also created with common LZMA header values to
help speed up the discovery of valid headers.
All .c files can be built as separate fuzz targets. This simplifies
the Makefile by allowing us to use wildcards instead of having a
Makefile target for each fuzz target.
Some compilers support __attribute__((__ifunc__())) even though the
dynamic linker does not. The compiler is able to create the binary
but it will fail on startup. So it is not enough to just test if
the attribute is supported.
The default value for enable_ifunc is now auto, which will attempt
to compile a program using __attribute__((__ifunc__())). There are
additional checks in this program if glibc is being used or if it
is running on FreeBSD.
Setting --enable-ifunc will skip this test and always enable
__attribute__((__ifunc__())), even if is not supported.
The new is_tty() will report if a file descriptor is a terminal or not.
On POSIX systems, it is a wrapper around isatty(). However, the native
Windows implementation of isatty() will return true for all character
devices, not just terminals. So is_tty() has a special case for Windows
so it can use alternative Windows API functions to determine if a file
descriptor is a terminal.
This fixes a bug with MSVC and MinGW-w64 builds that refused to read from
or write to non-terminal character devices because xz thought it was a
terminal. For instance:
xz foo -c > /dev/null
would fail because /dev/null was assumed to be a terminal.
This tests some complicated interactions with the --suffix= option.
The suffix option must be used with --format=raw, but can optionally
be used to override the default .xz suffix.
This test also verifies some recent bugs have been correctly solved
and to hopefully avoid further regressions in the future.
The following command caused a segmentation fault:
xz -Fraw --lzma1 --files=foo
when foo was a valid file. The usage of --files or --files0 was not
being checked when compressing or decompressing in raw mode without a
suffix. The suffix checking code was meant to validate that all files
to be processed are "-" (if not writing to standard out), meaning the
data is only coming from standard in. In this case, there were no file
names to check since --files and --files0 store their file name in a
different place.
Later code assumed the suffix was set and caused a segmentation fault.
Now, the above command results in an error.
The previous version set opt_stdout, but this caused an issue with
copying an input file to standard out when decompressing an unknown file
type. The following needs to result in an error:
echo foo | xz -df
since -c, --stdout is not used. This fixes the previous error by not
setting opt_stdout.
This fixes a bug introduced in cc5aa9ab13
when the suffix check was initially moved. This caused a situation that
previously worked:
echo foo | xz -Fraw --lzma1 | wc -c
to fail because the old code knew that this would write to standard out
so a suffix was not needed.
If the -c, --stdout argument is not used, then we can still detect when
the data will be written to standard out if all of the provided
filenames are "-" (denoting standard in) or if no filenames are
provided.
The macro lzma_attr_visibility_hidden has to be defined to make
fastpos.h usable. The visibility attribute is irrelevant to
fastpos_tablegen.c so simply #define the macro to an empty value.
fastpos_tablegen.c is never built by the included build systems
and so the problem wasn't noticed earlier. It's just a standalone
program for generating fastpos_table.c.
Fixes: https://github.com/tukaani-project/xz/pull/69
Thanks to GitHub user Jamaika1.
In ELF shared libs:
-fvisibility=hidden affects definitions of symbols but not
declarations.[*] This doesn't affect direct calls to functions
inside liblzma as a linker can replace a call to lzma_foo@plt
with a call directly to lzma_foo when -fvisibility=hidden is used.
[*] It has to be like this because otherwise every installed
header file would need to explictly set the symbol visibility
to default.
When accessing extern variables that aren't defined in the
same translation unit, compiler assumes that the variable has
the default visibility and thus indirection is needed. Unlike
function calls, linker cannot optimize this.
Using __attribute__((__visibility__("hidden"))) with the extern
variable declarations tells the compiler that indirection isn't
needed because the definition is in the same shared library.
About 15+ years ago, someone told me that it would be good if
the CRC tables would be defined in the same translation unit
as the C code of the CRC functions. While I understood that it
could help a tiny amount, I didn't want to change the code because
a separate translation unit for the CRC tables was needed for the
x86 assembly code anyway. But when visibility attributes are
supported, simply marking the extern declaration with the
hidden attribute will get identical result. When there are only
a few affected variables, this is trivial to do. I wish I had
understood this back then already.
MinGW (formely a MinGW.org Project, later the MinGW.OSDN Project
at <https://osdn.net/projects/mingw/>) has GCC 9.2.0 as the
most recent GCC package (released 2021-02-02). The project might
still be alive but majority of people have switched to MinGW-w64.
Thus it seems clearer to refer to MinGW-w64 in our API headers too.
Building with MinGW is likely to still work but I haven't tested it
in the recent years.
A CMake option LARGE_FILE_SUPPORT is created if and only if
-D_FILE_OFFSET_BITS=64 affects sizeof(off_t).
This is needed on many 32-bit platforms and even with 64-bit builds
with MinGW-w64 to get support for files larger than 2 GiB.
Autotools based build uses -pthread and thus adds it to Libs.private
in liblzma.pc. CMake doesn't use -pthread at all if pthread functions
are available in libc so Libs.private doesn't get -pthread either.
It properly adds -DLZMA_API_STATIC when compiling code that
will be linked against static liblzma. Having it there on
systems other than Windows does no harm.
See: https://www.msys2.org/docs/pkgconfig/
Now configure will fail if -fsanitize= is found in CFLAGS
and sanitizer-incompatible ifunc or Landlock sandboxing
would be used. These are incompatible with one or more sanitizers.
It's simpler to reject all -fsanitize= uses instead of trying to
pass those that might not cause problems.
CMake-based build was updated similarly. It lets the configuration
finish (SEND_ERROR instead of FATAL_ERROR) so that both error
messages can be seen at once.
The sandboxing on Linux now supports Landlock, which restricts all
supported filesystem actions after xz opens the files it needs. The
sandbox is only enabled when one file is input and we are writing to
standard out. With fsanitize=address,undefined, the instrumentation
needs to read additional files after the sandbox is in place. This
forces all xz based test to fail, so the sandbox must instead be
disabled.
Using set(ENABLE_THREADS "posix") is confusing because it sets
a new normal variable and leaves the cache entry with the same
name unchanged. The intent wasn't to change the cache entry so
this switches to a different variable name.
This way typos are caught quickly and compounding error messages
are avoided (a single typo could cause more than one error).
This keeps using SEND_ERROR when the system is lacking a feature
(like threading library or sandboxing method). This way the whole
configuration log will be generated in case someone wishes to
report a problem upstream.
This removes support for FreeBSD 10.0 and 10.1 which used
<sys/capability.h> instead of <sys/capsicum.h>. Support for
FreeBSD 10.1 ended on 2016-12-31. So now FreeBSD >= 10.2 is
required to enable Capsicum support.
This also removes support for Capsicum on Linux (libcaprights)
which seems to have been unmaintained since 2017 and Linux 4.11:
https://github.com/google/capsicum-linux
See the new comment in the code.
This also makes the check for clock_gettime() run with MinGW-w64
with which we don't want to use clock_gettime(). The previous
commit already took care of this situation.
This commit alone doesn't change anything in the real-world:
- configure.ac currently checks for clock_gettime() only
when using pthreads.
- CMakeLists.txt doesn't check for clock_gettime() on Windows.
So clock_gettime() wasn't used with MinGW-w64 before either.
clock_gettime() provides monotonic time and it's better than
gettimeofday() in this sense. But clock_gettime() is defined
in winpthreads, and liblzma or xz needs nothing else from
winpthreads. By avoiding clock_gettime(), we avoid the dependency on
libwinpthread-1.dll or the need to link against the static version.
As a bonus, GetTickCount64() and MinGW-w64's gettimeofday() can be
faster than clock_gettime(CLOCK_MONOTONIC, &tv). The resolution
is more than good enough for the progress indicator in xz.
This partially reverts creating crc_clmul.c
(8c0f9376f5) where is_clmul_supported()
was moved, extern'ed, and renamed to lzma_is_clmul_supported(). This
caused a problem when the function call to lzma_is_clmul_supported()
results in a call through the PLT. ifunc resolvers run very early in
the dynamic loading sequence, so the PLT may not be setup properly at
this point. Whether the PLT is used or not for
lzma_is_clmul_supported() depened upon the compiler-toolchain used and
flags.
In liblzma compiled with GCC, for instance, GCC will go through the PLT
for function calls internal to liblzma if the version scripts and
symbol visibility hiding are not used. If lazy-binding is disabled,
then it would have made any program linked with liblzma fail during
dynamic loading in the ifunc resolver.
Currently crc32 is always enabled, so COND_CHECK_CRC32 must always be
set. Because of this, it makes the recent change to conditionally
compile check/crc_clmul.c appear wrong since that file has CLMUL
implementations for both CRC32 and CRC64.
The option is enabled by default, but will only be visible to a user
listing cache variables or using a CMake GUI application if the
immintrin.h header file is found.
This mirrors our Autotools build --disable-clmul-crc functionality.
After forcing crc_simd_body() to always be inlined it caused
-fsanitize=address to fail for lzma_crc32_clmul() and
lzma_crc64_clmul(). The __no_sanitize_address__ attribute was added
to lzma_crc32_clmul() and lzma_crc64_clmul(), but not removed from
crc_simd_body(). ASAN and inline functions behavior has changed over
the years for GCC specifically, so while strictly required we will
keep __attribute__((__no_sanitize_address__)) on crc_simd_body() in
case this becomes a requirement in the future.
Older GCC versions refuse to inline a function with ASAN if the
caller and callee do not agree on sanitization flags
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89124#c3). If the
function was forced to be inlined, it will not compile if the callee
function has __no_sanitize_address__ but the caller doesn't.
PowerPC64LE wasn't tested but it seems like a safe change.
POWER8 supports unaligned access in little endian mode. Testing
on godbolt.org shows that GCC uses unaligned access by default.
The RISC-V macro __riscv_misaligned_fast is very new and not
in any stable compiler release yet.
Documentation in INSTALL was updated to match.
Documentation about an autodetection bug when using ARM64 GCC
with -mstrict-align was added to INSTALL.
CMake files weren't updated yet.
In XZ Utils context this doesn't matter much because
unaligned reads and writes aren't used in hot code
when TUKLIB_FAST_UNALIGNED_ACCESS isn't #defined.
After testing a 32-bit Release build on MSVC, only lzma_crc64_clmul()
has the bug. crc_simd_body() and lzma_crc32_clmul() do not need the
optimizations disabled.
Forcing this to be inline has a significant speed improvement at the
cost of a few repeated instructions. The compilers tested on did not
inline this function since it is large and is used twice in the same
translation unit.
This macro must be used instead of the inline keyword. On MSVC, it is
a replacement for __forceinline which is an MSVC specific keyword that
should not be used with inline (it will issue a warning if it is).
It does not use a build system check to determine if
__attribute__((__always_inline__)) since all compilers that can use
CLMUL extensions (except the special case for MSVC) should support this
attribute. If this assumption is incorrect then it will result in a bug
report instead of silently producing slow code.
A detailed description of the three dispatch methods was added. Also,
duplicated comments now only appear in crc32_fast.c or were removed from
both crc32_fast.c and crc64_fast.c if they appeared in crc_clmul.c.
Both crc32_clmul() and crc64_clmul() are now exported from
crc32_clmul.c as lzma_crc32_clmul() and lzma_crc64_clmul(). This
ensures that is_clmul_supported() (now lzma_is_clmul_supported()) is
not duplicated between crc32_fast.c and crc64_fast.c.
Also, it encapsulates the complexity of the CLMUL implementations into a
single file and reduces the complexity of crc32_fast.c and crc64_fast.c.
Before, CLMUL code was present in crc32_fast.c, crc64_fast.c, and
crc_common.h.
During the conversion, various cleanups were applied to code (thanks to
Lasse Collin) including:
- Require using semicolons with MASK_/L/H/LH macros.
- Variable typing and const handling improvements.
- Improvements to comments.
- Fixes to the pragmas used.
- Removed unneeded variables.
- Whitespace improvements.
- Fixed CRC_USE_GENERIC_FOR_SMALL_INPUTS handling.
- Silenced warnings and removed the need for some #pragmas
Referencing actions by commit SHA in GitHub workflows guarantees you are using an immutable version. Actions referenced by tags and branches are more vulnerable to attacks, such as the tag being moved to a malicious commit or a malicious commit being pushed to the branch.
It's important to make sure the SHA's are from the original repositories and not forks.
For reference:
https://github.com/actions/checkout/releases/tag/v4.1.08ade135a41https://github.com/actions/upload-artifact/releases/tag/v3.1.3a8a3f3ad30
Signed-off-by: Gabriela Gutierrez <gabigutierrez@google.com>
CMake doesn't set WIN32 on CYGWIN but the workaround is
probably needed on Cygwin too. Same for MSYS and MSYS2.
The workaround must not be used with Clang that is acting in
MSVC mode. This fixes it by checking for the known environments
that need the workaround instead of using "NOT MSVC".
Thanks to Martin Storsjö.
0570308ddd (commitcomment-129098431)
lld 17.0.1 searches for libraries to link first in the toolchain
directories before the local directory when building. The is a problem
for us because liblzma.a is installed in MSYS2 CLANG64 by default and
xz.exe will thus use the installed library instead of the one being
built.
This causes tests to fail when they are expecting features to be
disabled. More importantly, it will compile xz.exe with an incorrect
liblzma and could cause unexpected behavior by being unable to update
liblzma code in static builds. The CLANG64 environment can be tested
again once this is fixed.
Link to bug: https://github.com/llvm/llvm-project/issues/67779.
The Ninja Generator for CMake cannot have a custom target and its
BYPRODUCTS have the same name. This has prevented Ninja builds on
Unix-like systems since the xz symlinks were introduced in
80a1a8bb83.
llvm-windres 17.0.0 has more accurate emulation of GNU windres, so
the hack for GNU windres must now be used with llvm-windres too.
LLVM 16.0.6 has the old behavior and there likely won't be more
16.x releases. So we can simply check for >= 17.0.0.
See also:
2bcc0fdc58
The C standards don't allow an empty translation unit which can be
avoided by declaring something, without exporting any symbols.
When I committed f644473a21 I had
a feeling that some specific toolchain somewhere didn't like
empty object files (assembler or maybe "ar" complained) but
I cannot find anything to confirm this now. Quite likely I
remembered nonsense. I leave this here as a note to my future self. :-)
When the generic fast crc64 method is used, then we omit
lzma_crc64_table[][]. Similar to
d9166b52cf, we can avoid compiler warnings
with -Wempty-translation-unit (Clang) or -pedantic (GCC) by creating a
never used typedef instead of an extra symbol.
Now if user-supplied CFLAGS contains -Wall -Wextra -Wpedantic
the two checks that need -Werror will still work.
At CMake side there is add_compile_options(-Wall -Wextra)
but it didn't affect the -Werror tests. So with both Autotools
and CMake only user-supplied CFLAGS could make the checks fail
when they shouldn't.
This is not a full fix as things like -Wunused-macros in
user-supplied CFLAGS will still cause problems with both
GCC and Clang.
There were two uses of AC_COMPILE_IFELSE that didn't use
AC_LANG_SOURCE and Autoconf warned about these. The omission
had been intentional but it turned out that this didn't do
what I thought it would.
Autoconf 2.71 manual gives an impression that AC_LANG_SOURCE
inserts all #defines that have been made with AC_DEFINE so
far (confdefs.h). The idea was that omitting AC_LANG_SOURCE
would mean that only the exact code included in the
AC_COMPILE_IFELSE call would be compiled.
With C programs this is not true: the #defines get added without
AC_LANG_SOURCE too. There seems to be no neat way to avoid this.
Thus, with the C language at least, adding AC_LANG_SOURCE makes
no other difference than silencing a warning from Autoconf. The
generated "configure" remains identical. (Docs of AC_LANG_CONFTEST
say that the #defines have been inserted since Autoconf 2.63b and
that AC_COMPILE_IFELSE uses AC_LANG_CONFTEST. So the behavior is
documented if one also reads the docs of macros that one isn't
calling directly.)
Any extra code, including #defines, can cause problems for
these two tests because these tests must use -Werror.
CC=clang CFLAGS=-Weverything is the most extreme example.
It enables -Wreserved-macro-identifier which warns about
#define __EXTENSIONS__ 1 because it begins with two underscores.
It's possible to write a test file that passes -Weverything but
it becomes impossible when Autoconf inserts confdefs.h.
So this commit adds AC_LANG_SOURCE to silence Autoconf warnings.
A different solution is needed for -Werror tests.
We can still avoid modifying the contents of this file during
configuration to simplify the build systems. Gnulib added replacements
for inclusions guards for Cygwin. Cygwin should not need getopt_long
replacement so this feature can be omitted.
<unistd.h> is conditionally included to avoid MSVC since it is not
available.
The definition for _GL_ARG_NONNULL was also copied into this file from
Gnulib since this stage is usually done during gnulib-tool.
The code maintains the prior modifications of conditionally including
config.h and disabling NLS support.
_GL_UNUSED is repalced with the simple cast to void trick. _GL_UNUSED
is only used for these two parameters so its simpler than having to
define it.
This was modified slightly from Gnulib. In Gnulib, it expects the
@HAVE_SYS_CDEFS_H@ to be replaced. Instead, we can set HAVE_SYS_CDEFS_H
on systems that have it and avoid copying another file into the build
directory. Since we are not using gnulib-tool, copying extra files
requires extra build system updates (and special handling with CMake) so
we should avoid when possible.
The getopt related files have changed from Gnulib by splitting up
getopt.in.h into more modular header files. We could have kept
everything in just getopt.in.h, but this will help us continue to update
in the future.
Before this commit, the following writes "foo" to the
console and deletes the input file:
echo foo | xz > con_xz
xz --suffix=_xz --decompress con_xz
It cannot happen without --suffix because names like con.xz
are also special and so attempting to decompress con.xz
(or compress con to con.xz) will already fail when opening
the input file.
Similar thing is possible when compressing. The following
writes to "nul" and the input file "n" is deleted.
echo foo | xz > n
xz --suffix=ul n
Now xz checks if the destination is a special file before
continuing. DOS/DJGPP version had a check for this but
Windows (and OS/2) didn't.
xzdec might build with VS2013 but it hasn't been tested.
It was never supported before and VS2013 is old anyway
so for simplicity only liblzma is supported with VS2013.
Building the command line tools xz and xzdec with the combination
of CMake + Visual Studio 2015/2017/2019/2022 works now.
VS2013 update 2 should still be able to build liblzma.
VS2013 cannot build the xz command line tool because xz
needs snprintf() that roughly conforms to C99.
VS2013 is old and no extra code will be added to support it.
Thanks to Kelvin Lee and Jia Tan for testing.
There are several new policies. CMP0149 may affect the Windows SDK
version that CMake will choose by default. The new behavior is more
predictable, always choosing the latest SDK version by default.
The other new policies shouldn't affect this package.
The CMake-based build doesn't use config.h.
Up-to-date getopt_long in Gnulib is LGPLv2 so at some
point it could be included in XZ Utils too but for now
this commit is enough to make CMake-based build possible.
For compatibility with C23's [[noreturn]], tuklib_attr_noreturn
must be at the beginning of declaration (before "extern" or
"static", and even before any GNU C's __attribute__).
This commit also moves all other function attributes to
the beginning of function declarations. "extern" is kept
at the beginning of a line so the attributes are listed on
separate lines before "extern" or "static".
xrealloc() is obviously incorrect, modern GCC docs even
mention realloc() as an example where this attribute
cannot be used.
liblzma's lzma_alloc() and lzma_alloc_zero() would be
correct uses most of the time but custom allocators
may use a memory pool or otherwise hold the pointer
so aliasing issues could happen in theory.
The xstrdup() case likely was correct but I removed it anyway.
Now there are no __malloc__ attributes left in the code.
The allocations aren't in hot paths so this should make
no practical difference.
This makes no difference for GCC or Clang as they support
GNU C's __attribute__((__noreturn__)) but this helps with MSVC:
- VS 2019 version 16.7 and later support _Noreturn if the
options /std:c11 or /std:c17 are used. This gets handled
with the check for __STDC_VERSION__ >= 201112.
- When MSVC isn't in C11/C17 mode, __declspec(noreturn) is used.
C23 will deprecate _Noreturn (and <stdnoreturn.h>)
for [[noreturn]]. This commit anticipates that but
the final __STDC_VERSION__ value isn't known yet.
If CMake was configured more than once, HAVE_CLOCK_GETTIME and
HAVE_CLOCK_MONOTONIC would not be set as compile definitions. The check
for librt being needed to provide HAVE_CLOCK_GETTIME was also
simplified.
Now the two variations of the format strings are created with
a macro, and the whole detection code can be easily disabled
on platforms where thousand separator formatting is known to
not work (MSVC has no support, and on DJGPP 2.05 it can have
problems in some cases).
The argument to vli_ceil4() should always guarantee the return value
is also a valid lzma_vli. Thus the highest three valid lzma_vli values
are invalid arguments. All uses of the function ensure this so the
assert is updated to match this.
This was not a security bug since there was no path to overflow
UINT64_MAX in lzma_index_append() or when it calls index_file_size().
The bug was discovered by a failing assert() in vli_ceil4() when called
from index_file_size() when unpadded_sum (the sum of the compressed size
of current Stream and the unpadded_size parameter) exceeds LZMA_VLI_MAX.
Previously, the unpadded_size parameter was checked to be not greater
than UNPADDED_SIZE_MAX, but no check was done once compressed_base was
added.
This could not have caused an integer overflow in index_file_size() when
called by lzma_index_append(). The calculation for file_size breaks down
into the sum of:
- Compressed base from all previous Streams
- 2 * LZMA_STREAM_HEADER_SIZE (size of the current Streams header and
footer)
- stream_padding (can be set by lzma_index_stream_padding())
- Compressed base from the current Stream
- Unpadded size (parameter to lzma_index_append())
The sum of everything except for Unpadded size must be less than
LZMA_VLI_MAX. This is guarenteed by overflow checks in the functions
that can set these values including lzma_index_stream_padding(),
lzma_index_append(), and lzma_index_cat(). The maximum value for
Unpadded size is enforced by lzma_index_append() to be less than or
equal UNPADDED_SIZE_MAX. Thus, the sum cannot exceed UINT64_MAX since
LZMA_VLI_MAX is half of UINT64_MAX.
Thanks to Joona Kannisto for reporting this.
When the compiler supports __attribute__((__constructor__))
mythread_once() is never used, even with --enable-small. A configuration
with win95 threads and --enable-small will compile and be thread safe so
it can be allowed.
This isn't a very common configuration since MSVC does not support
__attribute__((__constructor__)), but MINGW32 and CLANG32 environments
for MSYS2 can use win95 threads and have
__attribute__((__constructor__)) support.
The "once_" variable was accidentally referred to as just "once". This
prevented building with Vista threads when
HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR was not defined.
The .codespellrc allows setting default options to avoid false positive
matches, set additional dictionaries, etc. For now, codespell can be
used locally before committing doc and comment changes.
It should help prevent silly errors and fix up commits in the future.
groff defaults to SGR escapes. Using -P-c passes -c to grotty
which restores the old behavior. Perhaps there is a better way to
get pure plain text output but this works for now.
signal.h in WASI SDK doesn't currently provide sigprocmask()
or sigset_t. liblzma doesn't need them so this change makes
liblzma and xzdec build against WASI SDK. xz doesn't build yet
and the tests don't either as tuktest needs setjmp() which
isn't (yet?) implemented in WASI SDK.
Closes: https://github.com/tukaani-project/xz/pull/57
See also: https://github.com/tukaani-project/xz/pull/56
(The original commit was edited a little by Lasse Collin.)
The CMake build will try to create broken symlinks on Unix and Unix-like
platforms. Cygwin and MSYS2 are Unix-like, but may not be able to create
broken symlinks. The value of the CYGWIN or MSYS environment variables
determine if broken symlinks are valid.
The default for MSYS2 does not allow for broken symlinks, so the CMake
build has been broken for MSYS2 since commit
80a1a8bb83.
All of the MSYS2 environments need make, and it does not come with the
toolchain package. The toolchain package will install the needed
compiler toolchains since without this package CMake cannot properly
generate the Makefiles.
The default for many of the MSYS2 environments is for CMake to create
Ninja build files. This would complicate the build script since we would
need a different command to run the tests. Its simpler to always use
Unix Makefiles so that "make test" is always a usable target for
testing.
To workaround Automake lacking Windows resource compiler support, an
empty source file is compiled to overwrite the resource files for static
library builds. Translation units without an external declaration are
not allowed by the C standard and result in a warning when used with
-Wempty-translation-unit (Clang) or -pedantic (GCC).
Only a subset of the tests run by the Linux and MacOS Autotools builds
are run. The most interesting tests are the ones that disable threads,
encoders, and decoders.
The Windows runner will only be run manually since these tests will
likely take much longer than the Linux and MacOS runners. This runner
should be used before merging any large features and before releases.
Currently the clang64 environment fails to due to a warning and
-Werror is enabled for the CI tests. This is still an early version
since the CMake build can be done for MSVC and optionally each of the
MSYS2 environments. GitHub does not allow manually running the CI tests
unless the workflow is checked on the default branch so checking in a
minimum version is a good idea.
Thanks to Arthur S for the original proposing the original patch.
Closes: https://github.com/tukaani-project/xz/pull/34
Clang 16.0.0 and earlier have a bug that the ifunc resolver function
triggers the -Wunused-function warning. The resolver function is static
and only "used" by the __attribute__((__ifunc()__)).
At this time, the bug is still unresolved, but has been reported:
https://github.com/llvm/llvm-project/issues/63957
This is not a problem in GCC.
This further improves the documentation from commit
f36ca7982f. The previous wording of
"supported options" was slightly misleading since the options that are
printed are the ones that are relevant for encoding/decoding. It is not
about which options can or must be specified.
The new Tests section describes basic information about the tests, how
to run them, and important details when cross compiling. We have had a
few questions about how to compile the tests without running them, so
hopefully this information will help others with the same question in the
future.
Fixes: https://github.com/tukaani-project/xz/issues/54
The Memory limit information section described three output
columns when it actually has six. This was reworded to
"multiple" to make it more future proof.
* Moved max_block_list_size from a global to local variable.
* Reworded error message in validate_block_list_filter().
* Removed helper function filter_chain_error().
* Changed 1 << X to 1U << X in many places
The order is now consistent with the order the command line arguments
are documented earlier in the man page. The new order is:
1. --list
2. --info-memory
3. --version
Instead of the previous order:
1. --version
2. --info-memory
3. --list
The --filters-help can be used to help create filter chains with the
--filters and --filtersX options. The message in --long-help is too
short to fully explain the syntax to construct complex filter chains.
In --robot mode, xz will only print the output from liblzma function
lzma_str_list_filters.
The --block-list option description needed updating since the new
--filtersX option changes how it can be used. The new entry for
--filters1=FILTERS ... --filter9=FILTERS was created right after
the --filters option.
If a filter chain is set but not used in --block-list, it introduced
unexpected behavior such as requiring an unneeded amount of memory to
compress, reducing the number of threads in multi-threaded encoding, and
printing an incorrect amount of memory needed to decompress.
This also renames filters_init_mask => filters_used_mask. A filter is
assumed to be used if it is specified in --filtersX until
coder_set_compression_settings() determines which filters are referenced
in --block-list.
When opt_block_size is not used, the Block size for mt encoder is
derived from the minimum of the largest Block specified by
--block-list and the recommended Block size on all filter chains
calculated by lzma_mt_block_size(). This avoids using unnecessary
memory and ensures that all Blocks are large enough for the most memory
needy filter chain.
Previously, only the default filter chain could have its memory usage
adjusted. The filter chains specified with --filtersX were not checked
for memory usage. Now, all used filter chains will be adjusted if
necessary.
The block splitting logic and split_block() function are not needed if
encoders are disabled. This will help slightly reduce the binary size
when built without encoders and allow split_block() to use functions
that require encoders being enabled.
This will only free filter chains created with --filters1-9 since the
default filter chain may be set from a static function variable. The
complexity to free the default filter chain is not worth the burden on
code maintenance.
The new command line options are meant to be combined with --block-list.
They work as an optional extension to --block-list to specify a custom
filter chain for each block listed. The new options allow the creation
of up to 9 reusable filter chains. For instance:
xz --block-list=1:10MiB,3:5MiB,,2:5MiB,1:0 --filters1=delta--lzma2 \
--filters2=x86--lzma2 --filters3=arm64--lzma2
Will create the following blocks:
1. A block of size 10 MiB with filter chain delta, lzma2.
2. A block of size 5 MiB with filter chain arm64, lzma2.
3. A block of size 5 MiB with filter chain arm64, lzma2.
4. A block of size 5 MiB with filter chain x86, lzma2.
5. A block containing the rest of the file contents with filter chain
delta, lzma2.
This is a little cleaner than the previous implementation of
forget_filter_chain(). It is also more consistent since
lzma_str_to_filters() will always terminate the filter chain so there
is no need to terminate it later in coder_set_compression_settings().
The --filters option uses the new lzma_str_to_filters() function
to convert a string into a full filter chain. Using this option
will reset all previous filters set by --preset, --[filter], or
--filters.
Fixed a bug where test_compress_* would all fail if arm64 or armthumb
filters were enabled for compression but arm was disabled. Since the
grep tests only checked for "define HAVE_ENCODER_ARM", this would match
on HAVE_ENCODER_ARM64 or HAVE_ENCODER_ARMTHUMB.
Now the config.h feature test requires " 1" at the end to prevent the
prefix problem. have_feature() was also updated for this even though
there were known current bugs affecting it. This is just in case future
features have a similar prefix problem.
Commit 78704f36e7 added an empty
initializer {} to prevent a warning. The empty initializer is a GNU
extension and results in a build failure on MSVC. The -wpedantic flag
warns about empty initializers.
Several tests were missing calls to lzma_index_end() to clean up the
lzma_index structs. The memory leaks were discovered by using
-fsanitize=address with GCC.
test_block_header was not properly freeing the filter options between
calls to lzma_block_header_decode(). The memory leaks were discovered by
using -fsanitize=address with GCC.
This change only impacts the compiler warning since it was impossible
for the wait_abs struct in stream_encode_mt() to be used before it was
initialized since mythread_condtime_set() will always be called before
mythread_cond_timedwait().
Since the mythread.h code is different between the POSIX and
Windows versions, this warning was only present on Windows builds.
Thanks to Arthur S for reporting the warning and providing an initial
patch.
In lzma_memcmplen(), the <intrin.h> header file is only included if
_MSC_VER and _M_X64 are both defined but _BitScanForward64() was
previously used if _M_X64 was defined. GCC for MSYS2 defines _M_X64 but
not _MSC_VER so _BitScanForward64() was used without including
<intrin.h>.
Now, lzma_memcmplen() will use __builtin_ctzll() for MSYS2 GCC builds as
expected.
ci_build.sh was updated to accept disabling of __attribute__ ifunc
and CLMUL. This will allow -fsanitize=address to pass because ifunc
is incompatible with -fsanitize=address. The CLMUL implementation has
optimizations that potentially read past the buffer and mask out the
unwanted bytes.
This test will only run on Autotools Linux.
The ifunc method avoids indirection via the function pointer
crc64_func. This works on GNU/Linux and probably on FreeBSD too.
The previous __attribute((__constructor__)) method is kept for
compatibility with ELF platforms which do support ifunc.
The ifunc method has some limitations, for example, building
liblzma with -fsanitize=address will result in segfaults.
The configure option --disable-ifunc must be used for such builds.
Thanks to Hans Jansen for the original patch.
Closes: https://github.com/tukaani-project/xz/pull/53
CMake build system will now verify if __attribute__((__ifunc__())) can be
used in the build system. If so, HAVE_FUNC_ATTRIBUTE_IFUNC will be
defined to 1.
Boost iostream uses `find_package` in quiet mode and then again uses
`find_package` with required. This second call triggers a
`add_library cannot create imported target "ZLIB::ZLIB" because another
target with the same name already exists.`
This can simply be fixed by skipping the alias part on secondary
`find_package` runs.
Reword "options required" to "supported options". The previous may have
suggested that the options listed were all required anytime a filter is
used for encoding or decoding. The reword makes this more clear that
adjusting the options is optional.
The lzma_mt_block_size() was previously just an internal function for
the multithreaded .xz encoder. It is used to provide a recommended Block
size for a given filter chain.
This function is helpful to determine the maximum Block size for the
multithreaded .xz encoder when one wants to change the filters between
blocks. Then, this determined Block size can be provided to
lzma_stream_encoder_mt() in the lzma_mt options parameter when
intializing the coder. This requires one to know all the filter chains
they are using before starting to encode (or at least the filter chain
that will need the largest Block size), but that isn't a bad limitation.
Legacy Windows did not need to #include <intrin.h> to use the MSVC
intrinsics. Newer versions likely just issue a warning, but the MSVC
documentation says to include the header file for the intrinsics we use.
GCC and Clang can "pretend" to be MSVC on Windows, so extra checks are
needed in tuklib_integer.h to only include <intrin.h> when it will is
actually needed.
Clang has support for __builtin_clz(), but previously Clang would
fallback to either the MSVC intrinsic or the regular C code. This was
discovered due to a bug where a new version of Clang required the
<intrin.h> header file in order to use the MSVC intrinsics.
Thanks to Anton Kochkov for notifying us about the bug.
If the cache file is not removed, CMake will not reset configurations
back to their default values. In order to make the tests independent, it
is simplest to purge the cache. Unfortunatly, this will slow down the
tests a little and repeat some checks.
The thread method is now configurable for the CMake build. It matches
the Autotools build by allowing ON (pick the best threading method),
OFF (no threading), posix, win95, and vista. If both Windows and
posix threading are both available, then ON will choose Windows
threading. Windows threading will also not use:
target_link_libraries(liblzma Threads::Threads)
since on systems like MinGW-w64 it would link the posix threads
without purpose.
Now, CMake will run similar feature disable tests that the Autotools
version did before. In order to do this without repeating lines in
ci.yml, it now makes sense to use the GitHub Workflow matrix to create
a loop.
This script is only meant to be run as part of the CI build/test process
on machines that are known to have bash (Ubuntu and MacOS). If this
assumption changes in the future, then the bash specific commands will
need to be replaced with a more portable option. For now, it is
convenient to use bash commands.
This allows users to change the features they build either in
CMakeCache.txt or by using a CMake GUI. The sources built for
liblzma are affected by this too, so only the necessary files
will be compiled.
This makes no functional difference in the generated configure
(at least with the Autotools versions I have installed) but this
change might prevent future bugs like the one that was just
fixed in the commit 5a5bd7f871.
This is broken in the releases 5.2.6 to 5.4.2. A workaround
for these releases is to pass EGREP='grep -E' as an argument
to configure in addition to --disable-threads.
The problem appeared when m4/ax_pthread.m4 was updated in
the commit 6629ed929c which
introduced the use of AC_EGREP_CPP. AC_EGREP_CPP calls
AC_REQUIRE([AC_PROG_EGREP]) to set the shell variable EGREP
but this was only executed if POSIX threads were enabled.
Libtool code also has AC_REQUIRE([AC_PROG_EGREP]) but Autoconf
omits it as AC_PROG_EGREP has already been required earlier.
Thus, if not using POSIX threads, the shell variable EGREP
would be undefined in the Libtool code in configure.
ax_pthread.m4 is fine. The bug was in configure.ac which called
AX_PTHREAD conditionally in an incorrect way. Using AS_CASE
ensures that all AC_REQUIREs get always run.
Thanks to Frank Busse for reporting the bug.
Fixes: https://github.com/tukaani-project/xz/issues/45
When the docs are installed, calling the directory "liblzma" is
confusing since multiple other files in the doc directory are for
liblzma. This should also make it more natural for distros when they
package the documentation.
The \mainpage command is used in the first block of comments in lzma.h.
This changes the previously nearly empty index.html to use the first
comment block in lzma.h for its contents.
lzma.h is no longer documented separately, but this is for the better
since lzma.h only defined a few macros that users do not need to use.
The individual API header files all have a disclaimer that they should
not be #included directly, so there should be no confusion on the fact
that lzma.h should be the only header used by applications.
Additionally, the note "See ../lzma.h for information about liblzma as
a whole." was removed since lzma.h is now the main page of the
generated HTML and does not have its own page anymore. So it would be
confusing in the HTML version and was only a "nice to have" when
browsing the source files.
Another command line option (--no-doxygen) was added to disable
creating the doxygen documenation in cases where it not wanted or
if the doxygen tool is not installed.
This is a helper script to generate the Doxygen documentation. It can be
run in 'liblzma' or 'internal' mode by setting the first argument. It
will default to 'liblzma' mode and only generate documentation for the
liblzma API header files.
The helper script will be run during the custom mydist hook when we
create releases. This hook already alters the source directory, so its
fine to do it here too. This way, we can include the Doxygen generated
files in the distrubtion and when installing.
In 'liblzma' mode, the JavaScript is stripped from the .html files and
the .js files are removed. This avoids license hassle from jQuery and
other libraries that Doxygen 1.9.6 puts into jquery.js in minified form.
Added a install-data-local target to install the Doxygen documentation
only when it has been generated. In order to correctly remove the docs,
a corresponding uninstall-local target was added.
If the doxygen docs exist in the source tree, they will also be included
in the distribution now too.
Instead of having Doxyfile.in configured by Autoconf, the Doxyfile
can have the tags that need to be configured piped into the doxygen
command through stdin with the overrides after Doxyfile's contents.
Going forward, the documentation should be generated in two different
modes: liblzma or internal.
liblzma is useful for most users. It is the documentation for just
the liblzma API header files. This is the default.
internal is for people who want to understand how xz and liblzma work.
It might be useful for people who want to contribute to the project.
Converts the existing lzma_index tests into tuktests and covers every
API function from index.h except for lzma_file_info_decoder, which can
be tested in the future.
Also remove unneeded "sandbox_allowed = false;" as this code
will never be run more than once (making it work with multiple
input files isn't trivial).
The warning causes the exit status to be 2, so this will cause problems
for many scripted use cases for xz. The sandbox usage is already very
limited already, so silently disabling this allows it to be more usable.
If a system has the Capsicum header files but does not actually
implement the system calls, then this would render xz unusable. Instead,
we can check if errno == ENOSYS and not issue a fatal error.
lzma_lzma_preset() does not guarentee that the lzma_options_lzma are
usable in an encoder even if it returns false (success). If liblzma
is built with default configurations, then the options will always be
usable. However if the match finders hc3, hc4, or bt4 are disabled, then
the options may not be usable depending on the preset level requested.
The documentation was updated to reflect this complexity, since this
behavior was unclear before.
The static global variables can be disabled if encoders and decoders
are not built. If they are not disabled and -Werror is used, it will
cause an usused warning as an error.
All functions now explicitly specify parameter and return values.
The notes and code annotations were moved before the parameter and
return value descriptions for consistency.
Also, the description above lzma_filter_encoder_is_supported() about
not being able to list available filters was removed since
lzma_str_list_filters() will do this.
In the C99 and C17 standards, section 6.5.6 paragraph 8 means that
adding 0 to a null pointer is undefined behavior. As of writing,
"clang -fsanitize=undefined" (Clang 15) diagnoses this. However,
I'm not aware of any compiler that would take advantage of this
when optimizing (Clang 15 included). It's good to avoid this anyway
since compilers might some day infer that pointer arithmetic implies
that the pointer is not NULL. That is, the following foo() would then
unconditionally return 0, even for foo(NULL, 0):
void bar(char *a, char *b);
int foo(char *a, size_t n)
{
bar(a, a + n);
return a == NULL;
}
In contrast to C, C++ explicitly allows null pointer + 0. So if
the above is compiled as C++ then there is no undefined behavior
in the foo(NULL, 0) call.
To me it seems that changing the C standard would be the sane
thing to do (just add one sentence) as it would ensure that a huge
amount of old code won't break in the future. Based on web searches
it seems that a large number of codebases (where null pointer + 0
occurs) are being fixed instead to be future-proof in case compilers
will some day optimize based on it (like making the above foo(NULL, 0)
return 0) which in the worst case will cause security bugs.
Some projects don't plan to change it. For example, gnulib and thus
many GNU tools currently require that null pointer + 0 is defined:
https://lists.gnu.org/archive/html/bug-gnulib/2021-11/msg00000.htmlhttps://www.gnu.org/software/gnulib/manual/html_node/Other-portability-assumptions.html
In XZ Utils null pointer + 0 issue should be fixed after this
commit. This adds a few if-statements and thus branches to avoid
null pointer + 0. These check for size > 0 instead of ptr != NULL
because this way bugs where size > 0 && ptr == NULL will likely
get caught quickly. None of them are in hot spots so it shouldn't
matter for performance.
A little less readable version would be replacing
ptr + offset
with
offset != 0 ? ptr + offset : ptr
or creating a macro for it:
#define my_ptr_add(ptr, offset) \
((offset) != 0 ? ((ptr) + (offset)) : (ptr))
Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1,
Clang >= 7, and Clang-based ICX to optimize it to the very same code
as ptr + offset. That is, it won't create a branch. So for hot code
this could be a good solution to avoid null pointer + 0. Unfortunately
other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a
branch from my_ptr_add().
Thanks to Marcin Kowalczyk for reporting the problem:
https://github.com/tukaani-project/xz/issues/36
Standardizing each function to always specify parameters and return
values. Also moved the parameters and return values to the end of each
function description.
On MicroBlaze, GCC 12 is broken in sense that
__has_attribute(__symver__) returns true but it still doesn't
support the __symver__ attribute even though the platform is ELF
and symbol versioning is supported if using the traditional
__asm__(".symver ...") method. Avoiding the traditional method is
good because it breaks LTO (-flto) builds with GCC.
See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101766
For now the only extra symbols in liblzma_linux.map are the
compatibility symbols with the patch that spread from RHEL/CentOS 7.
These require the use of __symver__ attribute or __asm__(".symver ...")
in the C code. Compatibility with the patch from CentOS 7 doesn't
seem valuable on MicroBlaze so use liblzma_generic.map on MicroBlaze
instead. It doesn't require anything special in the C code and thus
no LTO issues either.
An alternative would be to detect support for __symver__
attribute in configure.ac and CMakeLists.txt and fall back
to __asm__(".symver ...") but then LTO would be silently broken
on MicroBlaze. It sounds likely that MicroBlaze is a special
case so let's treat it as a such because that is simpler. If
a similar issue exists on some other platform too then hopefully
someone will report it and this can be reconsidered.
(This doesn't do the same fix in CMakeLists.txt. Perhaps it should
but perhaps CMake build of liblzma doesn't matter much on MicroBlaze.
The problem breaks the build so it's easy to notice and can be fixed
later.)
Thanks to Vincent Fazio for reporting the problem and proposing
a patch (in the end that solution wasn't used):
https://github.com/tukaani-project/xz/pull/32
Use "member" to refer to struct members as that's the term used
by the C standard.
Use lzma_options_delta.dist and such in docs so that in Doxygen's
HTML output they will link to the doc of the struct member.
Clean up a few trailing white spaces too.
It gives C4146 here since unary minus with unsigned integer
is still unsigned (which is the intention here). Doing it
with substraction makes it clearer and avoids the warning.
Thanks to Nathan Moinvaziri for reporting this.
Standardizing each function to always specify parameters and return
values. Also moved the parameters and return values to the end of each
function description.
A few small things were reworded and long sentences broken up.
All functions now explicitly specify parameter and return values.
Also moved the note about SHA-256 functions not being exported to the
top of the file.
Now, the LZMA_VERSION_MAJOR, LZMA_VERSION_MINOR, and LZMA_VERSION_PATCH
macros do not need to be on consecutive lines in version.h. They can be
separated by more whitespace, comments, or even other content, as long
as they appear in the proper order (major, minor, patch).
The bug is only a problem in applications that do not properly terminate
the filters[] array with LZMA_VLI_UNKNOWN or have more than
LZMA_FILTERS_MAX filters. This bug does not affect xz.
Added a few sentences to the description for lzma_block_encoder() and
lzma_block_decoder() to highlight that the Block Header must be coded
before calling these functions.
Standardizing each function to always specify params and return values.
Output pointer parameters are also marked with doxygen style [out] to
make it clear. Any note sections were also moved above the parameter and
return sections for consistency.
The flag description for LZMA_STR_NO_VALIDATION was previously confusing
about the treatment for filters than cannot be used with .xz format
(lzma1) without using LZMA_STR_ALL_FILTERS. Now, it is clear that
LZMA_STR_NO_VALIDATION is not a super set of LZMA_STR_ALL_FILTERS.
The workflow action for our CI pipeline can only reference artifacts in
the source directory, so we should ignore these files if the ci_build.sh
is run locally.
This way, if xz is stopped the elapsed time and estimated time
remaining won't get confused by the amount of time spent in
the stopped state.
This raises SIGSTOP. It's not clear to me if this is the correct way.
POSIX and glibc docs say that SIGTSTP shouldn't stop the process if
it is orphaned but this commit doesn't attempt to handle that.
Search for SIGTSTP in section 2.4.3:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
The previous documentation for lzma_str_to_filters() was technically
correct, but misleading. lzma_str_to_filters() returns NULL on success,
which is in practice always defined to 0. This is the same value as
LZMA_OK, but lzma_str_to_filters() does not return lzma_ret so we should
be more clear.
This reverts commit 82e3c968bf.
Macros in the reserved namespace (_foo or __foo) shouldn't be #defined
without a very good reason. Here the alternative would have been
to #define tuklib_has_warning(str) to an approriate value.
Also the tuklib_* files should stay namespace clean if possible.
__has_warning and other __has_foo macros are meant to become
compiler-agnostic so it's not good to check for __clang__ with it.
This also relied on tuklib_common.h for #defining __has_warning
which was confusing as #defining reserved macros is generally
not a good idea.
A few Doxygen tags were obsolete from 1.4.7. Version 1.8.17 released
in 2019, so this should be compatible with resonable modern distros.
The purpose of Doxygen these days is for docs on the website, so it
doesn't necessarily have to work for everyone. Just when the maintainers
want to update the docs.
Doxygen is now configurable in autotools only with
--enable-doxygen=[api|all]. The default is "api", which will only
generate HTML output for liblzma API functions. The LaTex documentation
output was also disabled.
tuklib_physmem depends on GetProcAddress() for both MSVC and MinGW-w64
to retrieve a function address. The proper way to do this is to cast the
return value to the type of function pointer retrieved. Unfortunately,
this causes a cast-function-type warning, so the best solution is to
simply ignore the warning.
clang supports the __has_warning macro to determine if the version of
clang compiling the code supports a given warning. If we do not define
it for other compilers, it may cause a preprocessor error.
The 32-bit build needs to be first so the configure cache only needs to
be reset one time. The 32-bit build sets the CFLAGS env variable, so any
build using that flag after will fail unless the cache is reset.
Calling coder_set_compression_settings() in list mode with verbose mode
on caused the filter chain and memory requirements to print. This was
unnecessary since the command results in an error and not consistent
with other formats like lzma and alone.
Disabling shared library generation and linking should help speed up the
runners. The shared library is still being tested in the 32 bit build
and the full feature.
Disabling nls is to check for any unexpected warnings or errors.
It's not that important. It can be annoying in builds that
disable many features since in those cases the tests programs
will correctly trigger this warning with Clang.
It doesn't warn on a 64-bit system because truncating
a ptrdiff_t (signed long) to uint32_t is diagnosed under
-Wconversion by GCC and -Wshorten-64-to-32 by Clang.
-Wstrict-aliasing was removed from the list since it is enabled
by -Wall already.
A normal build is clean with these on GNU/Linux x86-64 with
GCC 12.2.0 and Clang 14.0.6.
Explicitly casting the integer to lzma_check silences the warning.
Since such an invalid value is needed in multiple tests, a constant
INVALID_LZMA_CHECK_ID was added to tests.h.
The use of 0x1000 for lzma_block.check wasn't optimal as if
the underlying type is a char then 0x1000 will be truncated to 0.
However, in these test cases the value is ignored, thus even with
such truncation the test would have passed.
Note that assigning an unsigned int to lzma_check doesn't warn
on GNU/Linux x86-64 since the enum type is unsigned on that
platform. The enum can be signed on some other platform though
so it's best to use enumeration type lzma_check in these situations.
This is similar to 2ce4f36f17.
The actual initialization of the variables is done inside
mythread_sync() macro. Clang doesn't seem to see that
the initialization code inside the macro is always executed.
clang and gcc differ in how they handle -Wformat-nonliteral. gcc will
allow a non-literal format string as long as the function takes its
format arguments as a va_list.
This only occurs in test_filter_flags when the BCJ filters are not
configured and built. In this case, ARRAY_SIZE() returns 0 and causes a
type-limits warning with the loop variable since an unsigned number will
always be >= 0.
This affects only 32-bit x86 builds. x86-64 is OK as is.
I still cannot easily test this myself. The reporter has tested
this and it passes the tests included in the CMake build and
performance is good: raw CRC64 is 2-3 times faster than the
C version of the slice-by-four method. (Note that liblzma doesn't
include a MSVC-compatible version of the 32-bit x86 assembly code
for the slice-by-four method.)
Thanks to Iouri Kharon for figuring out a fix, testing, and
benchmarking.
This reverts commit 36edc65ab4.
It was reported that it wasn't a good enough fix and MSVC
still produced (different kind of) bad code when building
for 32-bit x86 if optimizations are enabled.
Thanks to Iouri Kharon.
On some platforms src/xz/suffix.c may need <strings.h> for
strcasecmp() but suffix.c includes the header when it needs it.
Unless there is an old system that otherwise supports enough C99
to build XZ Utils but doesn't have C89/C90-compatible <string.h>,
there should be no need to include <strings.h> in sysdefs.h.
SUSv2 and POSIX.1‐2017 declare only a few functions in <strings.h>.
Of these, strcasecmp() is used on some platforms in suffix.c.
Nothing else in the project needs <strings.h> (at least if
building on a modern system).
sysdefs.h currently includes <strings.h> if HAVE_STRINGS_H is
defined and suffix.c relied on this.
Note that dos/config.h doesn't #define HAVE_STRINGS_H even though
DJGPP does have strings.h. It isn't needed with DJGPP as strcasecmp()
is also in <string.h> in DJGPP.
It quite probably was never needed, that is, any system where memory.h
was required likely couldn't compile XZ Utils for other reasons anyway.
XZ Utils 5.2.6 and later source packages were generated using
Autoconf 2.71 which no longer defines HAVE_MEMORY_H. So the code
being removed is no longer used anyway.
At least on some systems, GNU windres needs --use-temp-file
in addition to the \x20 hack to avoid spaces in the command line
argument. Hovever, that \x20 syntax is broken with llvm-windres
version 15.0.0 (results in "XZx20Utils") but luckily it works
with a regular space. Thus it is best to limit the workarounds
to GNU toolchain on Windows.
Here are the list of the most significant issues addressed:
- Avoid using internal common.h header. It's not good to copy the
constants like this but common.h cannot be included for use outside
of liblzma. This is the quickest thing to do that could be fixed later.
- Omit the INIT_FILTER macro. Initialization should be done with just
regular designated initializers.
- Use start_offset = 257 for BCJ tests. It demonstrates that Filter
Flags encoder and decoder don't validate the options thoroughly.
257 is valid only for the x86 filter. This is a bit silly but
not a significant problem in practice because the encoder and
decoder initialization functions will catch bad alignment still.
Perhaps this should be fixed but it's not urgent and doesn't need
to be in 5.4.x.
- Various tweaks to comments such as filter id -> Filter ID
I haven't tested with MSVC myself and there doesn't seem to be
information about the problem online, so I'm relying on the bug report.
Thanks to Iouri Kharon for the bug report and the patch.
It's not needed in XZ Utils at least for now. It's good to support
it still because if such use is needed later, it wouldn't be
caught on GNU/Linux since malloc(0) from glibc returns non-NULL.
The command line tools cannot be built with MSVC for now but
they can be built with MinGW-w64.
Thanks to Iouri Kharon for the bug report and the original patch.
The old version used too many runners that resulted in unnecessary
dependency downloads. Now, the runners are reused for the different
configurations for each OS and build system.
The new PHASE argument can be build, test, or all. all is the default.
This way, the CI/CD script can differentiate between the build and test
phases to make it easier to track down errors when they happen.
common/index.h is needed by liblzma internally and tests. common.h will
include and define many things that are not needed by the tests. Also,
this prevents include order problems because common.h will redefine
LZMA_API resulting in a warning.
The shell parameter expansion using # and ## is not supported in
Solaris 10 Bourne shell (/bin/sh). Even though this is POSIX, it is not fully
portable, so we should avoid it.
5.5.0alpha won't be released, it's just to mark that
the branch is not for stable 5.4.x.
Once again there is no API/ABI stability for new features
in devel versions. The major soname won't be bumped even
if API/ABI of new features breaks between devel releases.
HAVE_DECL_PROGRAM_INVOCATION_NAME is renamed to
HAVE_PROGRAM_INVOCATION_NAME. Previously,
HAVE_DECL_PROGRAM_INVOCATION_NAME was always set when
building with autotools. CMake would only set this when it was 1, and the
dos/config.h did not define it. The new macro definition is consistent
across build systems.
Previously, <sys/time.h> was always included, even if mythread only used
clock_gettime. <time.h> is still needed even if clock_gettime is not used
though because struct timespec is needed for mythread_condtime.
Previously, if threading was enabled HAVE_DECL_CLOCK_MONOTONIC would always
be set to 0 or 1. However, this macro was needed in xz so if xz was not
built with threading and HAVE_DECL_CLOCK_MONOTONIC was not defined but
HAVE_CLOCK_GETTIME was, it caused a warning during build. Now,
HAVE_DECL_CLOCK_MONOTONIC has been renamed to HAVE_CLOCK_MONOTONIC and
will only be set if it is 1.
The CI/CD workflow will only execute on Ubuntu and MacOS latest version.
The workflow will attempt to build with autotools and CMake and execute
the tests. The workflow will run for all pull requests and pushes done
to the master branch.
Using return_if_error on lzma_lzma_lclppb_encode was improper because
return_if_error is expecting an lzma_ret value, but
lzma_lzma_lclppb_encode returns a boolean. This could result in
lzma_microlzma_encoder, which would be misleading for applications.