Commit Graph

1068 Commits

Author SHA1 Message Date
Lasse Collin 4ae13cfe0d sysdefs.h: Update the comment about __USE_MINGW_ANSI_STDIO. 2023-10-31 18:44:59 +08:00
Lasse Collin 660c8c29e5 xz: Windows: Don't (de)compress to special files like "con" or "nul".
Before this commit, the following writes "foo" to the
console and deletes the input file:

    echo foo | xz > con_xz
    xz --suffix=_xz --decompress con_xz

It cannot happen without --suffix because names like con.xz
are also special and so attempting to decompress con.xz
(or compress con to con.xz) will already fail when opening
the input file.

Similar thing is possible when compressing. The following
writes to "nul" and the input file "n" is deleted.

    echo foo | xz > n
    xz --suffix=ul n

Now xz checks if the destination is a special file before
continuing. DOS/DJGPP version had a check for this but
Windows (and OS/2) didn't.
2023-10-31 18:44:59 +08:00
Lasse Collin e3478ae4f3 liblzma: Move a few __attribute__ uses in function declarations.
The API headers have many attributes but these were left
as is for now.
2023-10-31 01:03:25 +08:00
Lasse Collin b71b8922ef xz, xzdec, lzmainfo: Use tuklib_attr_noreturn.
For compatibility with C23's [[noreturn]], tuklib_attr_noreturn
must be at the beginning of declaration (before "extern" or
"static", and even before any GNU C's __attribute__).

This commit also moves all other function attributes to
the beginning of function declarations. "extern" is kept
at the beginning of a line so the attributes are listed on
separate lines before "extern" or "static".
2023-10-31 01:03:25 +08:00
Lasse Collin 359e5c6cb1 Remove incorrect uses of __attribute__((__malloc__)).
xrealloc() is obviously incorrect, modern GCC docs even
mention realloc() as an example where this attribute
cannot be used.

liblzma's lzma_alloc() and lzma_alloc_zero() would be
correct uses most of the time but custom allocators
may use a memory pool or otherwise hold the pointer
so aliasing issues could happen in theory.

The xstrdup() case likely was correct but I removed it anyway.
Now there are no __malloc__ attributes left in the code.
The allocations aren't in hot paths so this should make
no practical difference.
2023-10-31 01:03:25 +08:00
Lasse Collin caf00e0988 liblzma: Mark crc64_clmul() with __attribute__((__no_sanitize_address__)).
Thanks to Agostino Sarubbo.
Fixes: https://github.com/tukaani-project/xz/issues/62
2023-10-31 01:03:25 +08:00
Lasse Collin 1f6e7c68fb xz: Refactor thousand separator detection and disable it on MSVC.
Now the two variations of the format strings are created with
a macro, and the whole detection code can be easily disabled
on platforms where thousand separator formatting is known to
not work (MSVC has no support, and on DJGPP 2.05 it can have
problems in some cases).
2023-10-31 01:03:25 +08:00
Lasse Collin ef71f83973 xz: Fix a too relaxed assertion and remove uses of SSIZE_MAX.
SSIZE_MAX isn't readily available on MSVC. Removing it means
that there is one thing less to worry when porting to MSVC.
2023-10-31 01:03:25 +08:00
Jia Tan 773f1e8622 liblzma: Update assert in vli_ceil4().
The argument to vli_ceil4() should always guarantee the return value
is also a valid lzma_vli. Thus the highest three valid lzma_vli values
are invalid arguments. All uses of the function ensure this so the
assert is updated to match this.
2023-10-26 06:22:24 +08:00
Jia Tan 68bda971bb liblzma: Add overflow check for Unpadded size in lzma_index_append().
This was not a security bug since there was no path to overflow
UINT64_MAX in lzma_index_append() or when it calls index_file_size().
The bug was discovered by a failing assert() in vli_ceil4() when called
from index_file_size() when unpadded_sum (the sum of the compressed size
of current Stream and the unpadded_size parameter) exceeds LZMA_VLI_MAX.

Previously, the unpadded_size parameter was checked to be not greater
than UNPADDED_SIZE_MAX, but no check was done once compressed_base was
added.

This could not have caused an integer overflow in index_file_size() when
called by lzma_index_append(). The calculation for file_size breaks down
into the sum of:

- Compressed base from all previous Streams
- 2 * LZMA_STREAM_HEADER_SIZE (size of the current Streams header and
  footer)
- stream_padding (can be set by lzma_index_stream_padding())
- Compressed base from the current Stream
- Unpadded size (parameter to lzma_index_append())

The sum of everything except for Unpadded size must be less than
LZMA_VLI_MAX. This is guarenteed by overflow checks in the functions
that can set these values including lzma_index_stream_padding(),
lzma_index_append(), and lzma_index_cat(). The maximum value for
Unpadded size is enforced by lzma_index_append() to be less than or
equal UNPADDED_SIZE_MAX. Thus, the sum cannot exceed UINT64_MAX since
LZMA_VLI_MAX is half of UINT64_MAX.

Thanks to Joona Kannisto for reporting this.
2023-10-26 06:22:24 +08:00
Jamaika1 c0c0cd4a48 mythread.h: Fix typo error in Vista threads mythread_once().
The "once_" variable was accidentally referred to as just "once". This
prevented building with Vista threads when
HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR was not defined.
2023-10-26 06:22:24 +08:00
Lasse Collin a108ed5891 xz: Omit an empty paragraph on the man page. 2023-08-02 17:39:50 +03:00
Jia Tan 03c51c5c08 Bump version and soname for 5.4.4. 2023-08-02 20:32:20 +08:00
ChanTsune 4170a80785 mythread.h: Disable signal functions in builds targeting Wasm + WASI.
signal.h in WASI SDK doesn't currently provide sigprocmask()
or sigset_t. liblzma doesn't need them so this change makes
liblzma and xzdec build against WASI SDK. xz doesn't build yet
and the tests don't either as tuktest needs setjmp() which
isn't (yet?) implemented in WASI SDK.

Closes: https://github.com/tukaani-project/xz/pull/57
See also: https://github.com/tukaani-project/xz/pull/56

(The original commit was edited a little by Lasse Collin.)
2023-08-01 18:44:02 +03:00
Dimitri Papadopoulos Orfanos 0db6fbe0be Docs: Fix typos found by codespell 2023-08-01 18:44:02 +03:00
Jia Tan 19899340cf liblzma: Prevent an empty translation unit in Windows builds.
To workaround Automake lacking Windows resource compiler support, an
empty source file is compiled to overwrite the resource files for static
library builds. Translation units without an external declaration are
not allowed by the C standard and result in a warning when used with
-Wempty-translation-unit (Clang) or -pedantic (GCC).
2023-08-01 18:41:42 +03:00
Jia Tan 8bc3146c6b xz: Update man page Authors and date. 2023-07-18 23:24:02 +08:00
Jia Tan c2905540ef xz: Slight reword in xz man page for consistency.
Changed will print => prints in xz --robot --version description to
match --robot --info-memory description.
2023-07-18 23:24:02 +08:00
Jia Tan 2600d33524 liblzma: Improve comment in string_conversion.c.
The comment used "flag" when referring to decoder options. Just
referring to them as options is more clear and consistent.
2023-07-18 23:24:02 +08:00
Jia Tan 98fc14541e liblzma: Reword lzma_str_list_filters() documentation.
Reword "options required" to "options read". The previous wording
may have suggested that the options listed were all required when
the filters are used for encoding or decoding. Now it should be
more clear that the options listed are the ones relevant for
encoding or decoding.
2023-07-18 23:21:23 +08:00
Lasse Collin 1ac79b4cba xz: Translate the second "%s: " in message.c since French needs "%s : ".
This string is used to print a filename when using "xz -v" and
stderr isn't a terminal.
2023-07-18 17:41:55 +03:00
Lasse Collin 97851be2c6 xz: Make "%s: %s" translatable because French needs "%s : %s". 2023-07-18 14:37:07 +03:00
Lasse Collin b406828a6d liblzma: Tweak #if condition in memcmplen.h.
Maybe ICC always #defines _MSC_VER on Windows but now
it's very clear which code will get used.
2023-07-18 14:03:08 +03:00
Lasse Collin ef4a07ad94 liblzma: Omit unnecessary parenthesis in a preprocessor directive. 2023-07-18 14:03:08 +03:00
Jia Tan 64ee0caaea liblzma: Prevent warning for MSYS2 Windows build.
In lzma_memcmplen(), the <intrin.h> header file is only included if
_MSC_VER and _M_X64 are both defined but _BitScanForward64() was
previously used if _M_X64 was defined. GCC for MSYS2 defines _M_X64 but
not _MSC_VER so _BitScanForward64() was used without including
<intrin.h>.

Now, lzma_memcmplen() will use __builtin_ctzll() for MSYS2 GCC builds as
expected.
2023-07-18 14:03:08 +03:00
Jia Tan c972d44103 xz: Fix typo in man page.
The Memory limit information section described three output
columns when it actually has six. This was reworded to
"multiple" to make it more future proof.
2023-07-18 13:27:46 +03:00
Jia Tan 1155471651 liblzma: Prevent uninitialzed warning in mt stream encoder.
This change only impacts the compiler warning since it was impossible
for the wait_abs struct in stream_encode_mt() to be used before it was
initialized since mythread_condtime_set() will always be called before
mythread_cond_timedwait().

Since the mythread.h code is different between the POSIX and
Windows versions, this warning was only present on Windows builds.

Thanks to Arthur S for reporting the warning and providing an initial
patch.
2023-07-18 13:20:16 +03:00
Jia Tan 4f57a9c991 liblzma: Adds lzma_nothrow to MicroLZMA API functions.
None of the liblzma functions may throw an exception, so this
attribute should be applied to all liblzma API functions.
2023-07-18 12:48:53 +03:00
Jia Tan 0cee63c3c6 Bump version and soname for 5.4.3. 2023-05-04 22:02:29 +08:00
Lasse Collin e9b9ea9531 tuklib_integer.h: Fix a recent copypaste error in Clang detection.
Wrong line was changed in 7062348bf3.
Also, this has >= instead of == since ints larger than 32 bits would
work too even if not relevant in practice.
2023-05-03 22:55:54 +03:00
Jia Tan 9e343a46cf Windows: Include <intrin.h> when needed.
Legacy Windows did not need to #include <intrin.h> to use the MSVC
intrinsics. Newer versions likely just issue a warning, but the MSVC
documentation says to include the header file for the intrinsics we use.

GCC and Clang can "pretend" to be MSVC on Windows, so extra checks are
needed in tuklib_integer.h to only include <intrin.h> when it will is
actually needed.
2023-04-25 20:19:32 +08:00
Jia Tan 12321a9390 tuklib_integer: Use __builtin_clz() with Clang.
Clang has support for __builtin_clz(), but previously Clang would
fallback to either the MSVC intrinsic or the regular C code. This was
discovered due to a bug where a new version of Clang required the
<intrin.h> header file in order to use the MSVC intrinsics.

Thanks to Anton Kochkov for notifying us about the bug.
2023-04-25 20:19:28 +08:00
Lasse Collin d1f0e01c39 liblzma: Update project maintainers in lzma.h.
AUTHORS was updated earlier, lzma.h was simply forgotten.
2023-04-25 20:19:21 +08:00
Jia Tan 8204c5d130 liblzma: Cleans up old commented out code. 2023-04-25 20:19:10 +08:00
Jia Tan c99d697df8 Build: Removes redundant check for LZMA1 filter support. 2023-04-25 20:18:18 +08:00
Lasse Collin 0673c9ec98 liblzma: Silence -Wsign-conversion in SSE2 code in memcmplen.h.
Thanks to Christian Hesse for reporting the issue.
Fixes: https://github.com/tukaani-project/xz/issues/44
2023-03-19 22:46:26 +02:00
Jia Tan 6ca8046ecb Bump version and soname for 5.4.2. 2023-03-18 23:22:06 +08:00
Lasse Collin 97679d25ce Change a few HTTP URLs to HTTPS.
The xz man page timestamp was intentionally left unchanged.
2023-03-18 22:02:40 +08:00
Jia Tan 94097157fa liblzma: Remove note from lzma_options_bcj about the ARM64 exception.
This was left in by mistake since an early version of the ARM64 filter
used a different struct for its options.
2023-03-17 20:19:10 +08:00
Jia Tan 7e2fa48bb7 liblzma: Add set lzma.h as the main page for Doxygen documentation.
The \mainpage command is used in the first block of comments in lzma.h.
This changes the previously nearly empty index.html to use the first
comment block in lzma.h for its contents.

lzma.h is no longer documented separately, but this is for the better
since lzma.h only defined a few macros that users do not need to use.
The individual API header files all have a disclaimer that they should
not be #included directly, so there should be no confusion on the fact
that lzma.h should be the only header used by applications.

Additionally, the note "See ../lzma.h for information about liblzma as
a whole." was removed since lzma.h is now the main page of the
generated HTML and does not have its own page anymore. So it would be
confusing in the HTML version and was only a "nice to have" when
browsing the source files.
2023-03-17 20:18:52 +08:00
Lasse Collin fd56d53533 xz: Make Capsicum sandbox more strict with stdin and stdout. 2023-03-11 19:34:39 +02:00
Lasse Collin d1bdaaebc6 xz: Don't fail if Capsicum is enabled but kernel doesn't support it.
(This commit combines related commits from the master branch.)

If Capsicum support is missing from the kernel or xz is being run
in an emulator that lacks Capsicum suport, the syscalls will fail
and set errno to ENOSYS. Previously xz would display and error and
exit, making xz unusable. Now it will check for ENOSYS and run
without sandbox support. Other tools like ssh behave similarly.

Displaying a warning for missing Capsicum support was considered
but such extra output would quickly become annoying. It would also
break test_scripts.sh in "make check".

Also move cap_enter() to be the first step instead of the last one.
This matches the example in the cap_rights_limit(2) man page. With
the current code it shouldn't make any practical difference though.

Thanks to Xin Li for the bug report, suggesting a fix, and testing:
https://github.com/tukaani-project/xz/pull/43

Thanks to Jia Tan for most of the original commits.
2023-03-11 19:31:40 +02:00
Jia Tan b82d4831e3 liblzma: Improve documentation for version.h.
Specified parameter and return values for API functions and documented
a few more of the macros.
2023-03-07 23:57:39 +08:00
Jia Tan 2caba3efe3 liblzma: Clarify lzma_lzma_preset() documentation in lzma12.h.
lzma_lzma_preset() does not guarentee that the lzma_options_lzma are
usable in an encoder even if it returns false (success). If liblzma
is built with default configurations, then the options will always be
usable. However if the match finders hc3, hc4, or bt4 are disabled, then
the options may not be usable depending on the preset level requested.

The documentation was updated to reflect this complexity, since this
behavior was unclear before.
2023-03-07 23:25:17 +08:00
Jia Tan 4042dbf03a liblzma: Replace '\n' -> newline in filter.h documentation.
The '\n' renders as a newline when the comments are converted to html
by Doxygen.
2023-03-07 23:24:46 +08:00
Jia Tan 3971f5c502 liblzma: Shorten return description for two functions in filter.h.
Shorten the description for lzma_raw_encoder_memusage() and
lzma_raw_decoder_memusage().
2023-03-07 23:24:42 +08:00
Jia Tan 5e61b39432 liblzma: Reword a few lines in filter.h 2023-03-07 23:24:38 +08:00
Jia Tan 8a53533869 liblzma: Improve documentation in filter.h.
All functions now explicitly specify parameter and return values.
The notes and code annotations were moved before the parameter and
return value descriptions for consistency.

Also, the description above lzma_filter_encoder_is_supported() about
not being able to list available filters was removed since
lzma_str_list_filters() will do this.
2023-03-07 23:24:32 +08:00
Lasse Collin dfc9a54082 liblzma: Avoid null pointer + 0 (undefined behavior in C).
In the C99 and C17 standards, section 6.5.6 paragraph 8 means that
adding 0 to a null pointer is undefined behavior. As of writing,
"clang -fsanitize=undefined" (Clang 15) diagnoses this. However,
I'm not aware of any compiler that would take advantage of this
when optimizing (Clang 15 included). It's good to avoid this anyway
since compilers might some day infer that pointer arithmetic implies
that the pointer is not NULL. That is, the following foo() would then
unconditionally return 0, even for foo(NULL, 0):

    void bar(char *a, char *b);

    int foo(char *a, size_t n)
    {
        bar(a, a + n);
        return a == NULL;
    }

In contrast to C, C++ explicitly allows null pointer + 0. So if
the above is compiled as C++ then there is no undefined behavior
in the foo(NULL, 0) call.

To me it seems that changing the C standard would be the sane
thing to do (just add one sentence) as it would ensure that a huge
amount of old code won't break in the future. Based on web searches
it seems that a large number of codebases (where null pointer + 0
occurs) are being fixed instead to be future-proof in case compilers
will some day optimize based on it (like making the above foo(NULL, 0)
return 0) which in the worst case will cause security bugs.

Some projects don't plan to change it. For example, gnulib and thus
many GNU tools currently require that null pointer + 0 is defined:

    https://lists.gnu.org/archive/html/bug-gnulib/2021-11/msg00000.html

    https://www.gnu.org/software/gnulib/manual/html_node/Other-portability-assumptions.html

In XZ Utils null pointer + 0 issue should be fixed after this
commit. This adds a few if-statements and thus branches to avoid
null pointer + 0. These check for size > 0 instead of ptr != NULL
because this way bugs where size > 0 && ptr == NULL will likely
get caught quickly. None of them are in hot spots so it shouldn't
matter for performance.

A little less readable version would be replacing

    ptr + offset

with

    offset != 0 ? ptr + offset : ptr

or creating a macro for it:

    #define my_ptr_add(ptr, offset) \
            ((offset) != 0 ? ((ptr) + (offset)) : (ptr))

Checking for offset != 0 instead of ptr != NULL allows GCC >= 8.1,
Clang >= 7, and Clang-based ICX to optimize it to the very same code
as ptr + offset. That is, it won't create a branch. So for hot code
this could be a good solution to avoid null pointer + 0. Unfortunately
other compilers like ICC 2021 or MSVC 19.33 (VS2022) will create a
branch from my_ptr_add().

Thanks to Marcin Kowalczyk for reporting the problem:
https://github.com/tukaani-project/xz/issues/36
2023-03-07 23:24:15 +08:00
Jia Tan f6dce49cb6 liblzma: Adjust container.h for consistency with filter.h. 2023-03-07 23:24:09 +08:00