Compare commits

...

14 Commits

Author SHA1 Message Date
Jia Tan 07779fa4e2 Tests: Add new test for xz -r, --recursive option. 2024-01-29 21:50:14 +08:00
Jia Tan 7ca735f2f0 xz: Update the man page for the -r, --recursive option. 2024-01-29 21:50:08 +08:00
Jia Tan e43178da0e xz: Add -r,--recursive to --help and --long-help. 2024-01-29 21:40:53 +08:00
Jia Tan 6b4b815b94 xz: Disable sandbox when recursive mode is used.
The sandbox is very restrictive when one file is being encoded/decoded
to standard out. In recursive mode, processing a directory requires
opening sub-files and sub-directories which would not be allowed under
the sandbox.
2024-01-29 21:40:53 +08:00
Jia Tan 340505c033 xz: Hide the number of input files with recursive mode.
In recursive mode we don't know how many files to process at the
beginning. So just like when using --files or --files0, the number of
total files will not be shown.
2024-01-29 21:40:53 +08:00
Jia Tan 9ffdb5f006 xz: Parse directories in recursive mode.
This directory parsing method prioritizes lower memory usage and file
descriptor utilization at the cost of more complicated code and a higher
number of small allocations. This method makes no recursive calls and
instead keeps a queue of directories to parse.Only one directory file
descriptor is ever needed at one time.

The directory_iterator abstracts the implementation of the directory
parsing to allow for an easy interface for both POSIX and MSVC.

Currently the MSVC builds suffers from MAX_PATH being limited to 260 by
default. This restricts the usefulness of recursive mode on Windows. A
user can edit a registry config in Windows 10, Version 1607 and later to
remove this low path limit. Alternatively, we can prefix the absolute
path with "\\?\" to also remove the restriction. Note, this restriction
also applies to the compatibility functions so MSVC builds cannot read
or write to files with paths longer than 260 characters.
2024-01-29 21:40:53 +08:00
Jia Tan f8c199bcdc xz: Restrict when recursive mode can be used.
If we are not compiling with dirent.h or MSVC, then we cannot use
recursive mode. Unfortunatly, there is not a great portable way to parse
directory contents.

There are _find_next() functions available for DOS like platforms, but
Windows version of these functions is different. Since we do not have a
good way to test these functions, support will not be added at this
time.
2024-01-29 21:40:53 +08:00
Jia Tan fe1af552d3 xz: Allow directories in io_open_src() in recursive mode.
If the directory is a symlink, it is skipped to prevent a loop in the
directory structure that would cause infinite recursion.
2024-01-29 21:40:53 +08:00
Jia Tan e08d65acaf Build: Check for dirent.h.
For both CMake and Autotools, define HAVE_DIRENT_H if the header file
is found.
2024-01-29 21:40:53 +08:00
Jia Tan 8d07e9bb7c xz: Enable -r, --recursive option. 2024-01-29 21:40:53 +08:00
Jia Tan b10b2e4a8f xz: Change the way coder_run() and list_run() are called in main().
Previously, a function pointer was used to determine if coder_run() or
list_run() should be called in the main entry processing loop. This was
replaced by an extra function call to process_entry().

coder_run() and list_run() were changed to accept a file_pair * argument
instead of a filename. The common repeated code was moved to
process_entry() instead.
2024-01-29 21:40:53 +08:00
Jia Tan a3bac71fe3 xz: Reorder #include order in private.h. 2024-01-29 21:40:53 +08:00
Jia Tan 882aad963e xz: Move some list_file() checks to args_parse().
The checks enforce that list mode will only run on .xz files. The
opt_format is only set during argument parsing and will not change
after. So we only need to check this once instead of every call to
list_file(). Additionally, this will cause the error to be detected
slightly earlier.
2024-01-29 21:40:53 +08:00
Jia Tan d6d1e40f19 xz: Add a function to print Windows specific error messages.
Native Windows C API functions do not use errno, but instead have to
call GetLastError(). There is not an easy way to convert this error
code into a helpful message, so this creates a wrapper around the
slightly complicated FormatMessage() function.

The new message_windows_error() function calls message_error() under the
hood, so it will set the exit status to 1.
2024-01-29 21:40:53 +08:00
17 changed files with 717 additions and 55 deletions

View File

@ -1006,6 +1006,15 @@ calculation if supported by the system" ON)
endif() endif()
endif() endif()
# MSVC shouldn't have this header file anyway but this won't waste time
# checking.
if(NOT MSVC)
# dirent.h
check_include_file(dirent.h HAVE_DIRENT_H)
tuklib_add_definition_if(liblzma HAVE_DIRENT_H)
endif()
# Support -fvisiblity=hidden when building shared liblzma. # Support -fvisiblity=hidden when building shared liblzma.
# These lines do nothing on Windows (even under Cygwin). # These lines do nothing on Windows (even under Cygwin).
# HAVE_VISIBILITY should always be defined to 0 or 1. # HAVE_VISIBILITY should always be defined to 0 or 1.

View File

@ -793,6 +793,8 @@ AC_CHECK_HEADERS([fcntl.h limits.h sys/time.h],
# cpuid.h may be used for detecting x86 processor features at runtime. # cpuid.h may be used for detecting x86 processor features at runtime.
AC_CHECK_HEADERS([immintrin.h cpuid.h]) AC_CHECK_HEADERS([immintrin.h cpuid.h])
# dirent.h allows for directory parsing in xz.
AC_CHECK_HEADERS([dirent.h])
############################################################################### ###############################################################################
# Checks for typedefs, structures, and compiler characteristics. # Checks for typedefs, structures, and compiler characteristics.

View File

@ -24,6 +24,7 @@ bool opt_force = false;
bool opt_keep_original = false; bool opt_keep_original = false;
bool opt_robot = false; bool opt_robot = false;
bool opt_ignore_check = false; bool opt_ignore_check = false;
bool opt_recursive = false;
// We don't modify or free() this, but we need to assign it in some // We don't modify or free() this, but we need to assign it in some
// non-const pointers. // non-const pointers.
@ -230,7 +231,7 @@ parse_real(args_info *args, int argc, char **argv)
{ "single-stream", no_argument, NULL, OPT_SINGLE_STREAM }, { "single-stream", no_argument, NULL, OPT_SINGLE_STREAM },
{ "no-sparse", no_argument, NULL, OPT_NO_SPARSE }, { "no-sparse", no_argument, NULL, OPT_NO_SPARSE },
{ "suffix", required_argument, NULL, 'S' }, { "suffix", required_argument, NULL, 'S' },
// { "recursive", no_argument, NULL, 'r' }, // TODO { "recursive", no_argument, NULL, 'r' },
{ "files", optional_argument, NULL, OPT_FILES }, { "files", optional_argument, NULL, OPT_FILES },
{ "files0", optional_argument, NULL, OPT_FILES0 }, { "files0", optional_argument, NULL, OPT_FILES0 },
@ -334,6 +335,11 @@ parse_real(args_info *args, int argc, char **argv)
suffix_set(optarg); suffix_set(optarg);
break; break;
// --recursive
case 'r':
opt_recursive = true;
break;
case 'T': { case 'T': {
// Since xz 5.4.0: Ignore leading '+' first. // Since xz 5.4.0: Ignore leading '+' first.
const char *s = optarg; const char *s = optarg;
@ -786,6 +792,20 @@ args_parse(args_info *args, int argc, char **argv)
if (opt_mode != MODE_COMPRESS) if (opt_mode != MODE_COMPRESS)
message_fatal(_("Decompression support was disabled " message_fatal(_("Decompression support was disabled "
"at build time")); "at build time"));
#else
// List mode is only available when decoders are enabled and is
// only valid with .xz files.
if (opt_mode == MODE_LIST) {
if (opt_format != FORMAT_XZ && opt_format != FORMAT_AUTO)
message_fatal(_("--list works only on .xz files "
"(--format=xz or --format=auto)"));
// Unset opt_stdout so that io_open_src() won't accept
// special files.
opt_stdout = false;
// Set opt_force so that io_open_src() will follow symlinks.
opt_force = true;
}
#endif #endif
#ifdef HAVE_LZIP_DECODER #ifdef HAVE_LZIP_DECODER
@ -794,6 +814,11 @@ args_parse(args_info *args, int argc, char **argv)
"is not supported")); "is not supported"));
#endif #endif
#if !defined(_MSC_VER) && !defined(HAVE_DIRENT_H)
if (opt_recursive)
message_fatal("Recursive mode is not supported");
#endif
// Never remove the source file when the destination is not on disk. // Never remove the source file when the destination is not on disk.
// In test mode the data is written nowhere, but setting opt_stdout // In test mode the data is written nowhere, but setting opt_stdout
// will make the rest of the code behave well. // will make the rest of the code behave well.

View File

@ -35,7 +35,7 @@ typedef struct {
extern bool opt_stdout; extern bool opt_stdout;
extern bool opt_force; extern bool opt_force;
extern bool opt_keep_original; extern bool opt_keep_original;
// extern bool opt_recursive; extern bool opt_recursive;
extern bool opt_robot; extern bool opt_robot;
extern bool opt_ignore_check; extern bool opt_ignore_check;

View File

@ -1429,16 +1429,8 @@ coder_passthru(file_pair *pair)
extern void extern void
coder_run(const char *filename) coder_run(file_pair *pair)
{ {
// Set and possibly print the filename for the progress message.
message_filename(filename);
// Try to open the input file.
file_pair *pair = io_open_src(filename);
if (pair == NULL)
return;
// Assume that something goes wrong. // Assume that something goes wrong.
bool success = false; bool success = false;

View File

@ -83,7 +83,7 @@ extern void coder_add_filter(lzma_vli id, void *options);
extern void coder_set_compression_settings(void); extern void coder_set_compression_settings(void);
/// Compress or decompress the given file /// Compress or decompress the given file
extern void coder_run(const char *filename); extern void coder_run(file_pair *pair);
#ifndef NDEBUG #ifndef NDEBUG
/// Free the memory allocated for the coder and kill the worker threads. /// Free the memory allocated for the coder and kill the worker threads.

View File

@ -21,6 +21,10 @@
static bool warn_fchown; static bool warn_fchown;
#endif #endif
#ifdef HAVE_DIRENT_H
# include <dirent.h>
#endif
#if defined(HAVE_FUTIMES) || defined(HAVE_FUTIMESAT) || defined(HAVE_UTIMES) #if defined(HAVE_FUTIMES) || defined(HAVE_FUTIMESAT) || defined(HAVE_UTIMES)
# include <sys/time.h> # include <sys/time.h>
#elif defined(HAVE__FUTIME) #elif defined(HAVE__FUTIME)
@ -613,6 +617,48 @@ io_copy_attrs(const file_pair *pair)
} }
#if defined(_MSC_VER) || (defined(_WIN32) && defined(HAVE_DIRENT_H))
/// \brief Tells whether the path is a directory and should be parsed.
///
/// On Windows, open() will return EACCES if the path is a directory. This
/// function will print determine if the directory should be processed or
/// print a better error message.
static bool
should_parse_dir_windows(const char *path)
{
DWORD file_attr = GetFileAttributes(path);
// If there is an error, it won't change errno. If we wanted to
// know more information about the error we coud use
// message_windows_error() to show detailed error description.
// Instead we can let the code fall through since the errno from
// the original _open() call is likely descriptive enough.
if (file_attr != INVALID_FILE_ATTRIBUTES) {
if (file_attr & FILE_ATTRIBUTE_DIRECTORY) {
// The FILE_ATTRIBUTE_REPARSE_POINT means the
// directory is either a symlink or a reparse point.
// We do not want to recurse into either of these,
// especially a symlink to a directory since this
// could lead to an infinite directory processing loop.
if (opt_recursive && (file_attr
& FILE_ATTRIBUTE_REPARSE_POINT))
message_warning(_("%s: Is a symlink to a "
"directory, skipping"), path);
else if (opt_recursive)
return true;
else
message_warning(_("%s: Is a directory, skipping"),
path);
}
} else {
message_error("%s: %s", path, strerror(errno));
}
return false;
}
#endif
/// Opens the source file. Returns false on success, true on error. /// Opens the source file. Returns false on success, true on error.
static bool static bool
io_open_src_real(file_pair *pair) io_open_src_real(file_pair *pair)
@ -751,15 +797,28 @@ io_open_src_real(file_pair *pair)
if (was_symlink) if (was_symlink)
message_warning(_("%s: Is a symbolic link, " message_warning(_("%s: Is a symbolic link, "
"skipping"), pair->src_name); "skipping"), pair->src_name);
else else
#endif #endif
{
#ifdef _WIN32
// The _open() function with MSVC will fail with
// EACCES if the path is a directory. We can give a
// more accurate error message in this case or, if
// in recursive mode, we can process the directory.
if (errno == EACCES) {
pair->is_directory = should_parse_dir_windows(
pair->src_name);
return !pair->is_directory;
}
#else
// Something else than O_NOFOLLOW failing // Something else than O_NOFOLLOW failing
// (assuming that the race conditions didn't // (assuming that the race conditions didn't
// confuse us). // confuse us).
message_error(_("%s: %s"), pair->src_name, message_error(_("%s: %s"), pair->src_name,
strerror(errno)); strerror(errno));
#endif
}
return true; return true;
} }
@ -778,11 +837,42 @@ io_open_src_real(file_pair *pair)
goto error_msg; goto error_msg;
#endif #endif
#ifdef HAVE_DIRENT_H
// MSVC cannot open() directories, so this check is
// skipped in that case.
if (S_ISDIR(pair->src_st.st_mode)) { if (S_ISDIR(pair->src_st.st_mode)) {
message_warning(_("%s: Is a directory, skipping"), if (!opt_recursive) {
pair->src_name); message_warning(_("%s: Is a directory, skipping"),
goto error; pair->src_name);
goto error;
}
// Do not allow symlinks with recursive mode because this
// could lead to a loop in the file system and thus infinite
// recursion. If a symlink is detected, skip it.
// S_ISLNK and lstat() are not available with MSVC so these
// need to be in an #ifdef
if (follow_symlinks) {
#ifdef _WIN32
if (!should_parse_dir_windows(pair->src_name))
goto error;
#else
if (lstat(pair->src_name, &pair->src_st) != 0)
goto error_msg;
if (S_ISLNK(pair->src_st.st_mode)) {
message_warning(_("%s: Is a symlink to a "
"directory, skipping"), pair->src_name);
goto error;
}
#endif
}
(void)close(pair->src_fd);
pair->is_directory = true;
return false;
} }
#endif
if (reg_files_only && !S_ISREG(pair->src_st.st_mode)) { if (reg_files_only && !S_ISREG(pair->src_st.st_mode)) {
message_warning(_("%s: Not a regular file, skipping"), message_warning(_("%s: Not a regular file, skipping"),
@ -880,6 +970,9 @@ io_open_src(const char *src_name)
.flush_needed = false, .flush_needed = false,
.dest_try_sparse = false, .dest_try_sparse = false,
.dest_pending_sparse = 0, .dest_pending_sparse = 0,
#if defined(_MSC_VER) || defined(HAVE_DIRENT_H)
.is_directory = false,
#endif
}; };
// Block the signals, for which we have a custom signal handler, so // Block the signals, for which we have a custom signal handler, so
@ -1482,3 +1575,161 @@ io_write(file_pair *pair, const io_buf *buf, size_t size)
return io_write_buf(pair, buf->u8, size); return io_write_buf(pair, buf->u8, size);
} }
#if defined(_MSC_VER) || defined(HAVE_DIRENT_H)
struct directory_iter_s {
#if defined(_MSC_VER)
HANDLE dir;
// The path must be saved because the call to
// directory_iterator_init() does not actually open
// the directory HANDLE. There is not a way to open
// the directory without reading the first entry.
// Instead, the search path is prepared in
// directory_iterator_init() so the first call to
// directory_iter_next() will be able to use the saved
// path.
char *path;
// Windows uses FindFirstFile() to do the first search and
// open the HANDLE to the directory. After that, FindNextFile()
// must be used to continue the search. So this flag marks if
// FindFirstFile() or FindNextFile() should be used.
bool first;
#elif defined(HAVE_DIRENT_H)
DIR *dir;
#endif
};
extern directory_iter *
directory_iterator_init(const char *path)
{
directory_iter *iter = xmalloc(sizeof(directory_iter));
#ifdef _MSC_VER
iter->first = true;
const size_t path_len = strlen(path);
char* path_search = xmalloc(path_len + 3);
memcpy(path_search, path, path_len);
// The windows directory search functions take a regular expression
// instead of just the directory name. Since we want all files in
// the directory, we need to append the wildcard character (*) to
// the end of the path.
//
// Note: It does not matter if the path parameter ends with the
// path separator. The search path is not displayed and the
// proper path name extension is handled elsewhere.
path_search[path_len] = PATH_SEP;
path_search[path_len + 1] = '*';
path_search[path_len + 2] = '\0';
iter->path = path_search;
#else
// On some platforms, opendir() interrupted so it is safest to
// block signals here.
signals_block();
iter->dir = opendir(path);
signals_unblock();
if (iter->dir == NULL) {
free(iter);
message_error(_("%s: Error opening the directory: %s"),
path, strerror(errno));
return NULL;
}
#endif
return iter;
}
extern bool
directory_iter_next(directory_iter *iter, char *entry, size_t *entry_len)
{
bool next = true;
char *next_entry;
#ifdef _MSC_VER
WIN32_FIND_DATA dir_entry;
if (iter->first) {
iter->dir = FindFirstFile(iter->path, &dir_entry);
// The existence of the directory is checked in
// io_open_src_real() so its most likely this
// is an empty directory.
if (iter->dir == INVALID_HANDLE_VALUE)
next = false;
iter->first = false;
}
else {
next = FindNextFile(iter->dir, &dir_entry);
}
next_entry = dir_entry.cFileName;
#else
// The only way to check if an error occurred is by saving the
// old errno and comparing it to the errno after readdir()
// completes. readdir() will return NULL on error and if the
// directory has been parsed to completion.
int old_errno = errno;
struct dirent *dir_entry = readdir(iter->dir);
if (dir_entry == NULL) {
// readdir() is not supposed to change the errno based on
// the POSIX standard. However the implementation used by
// MinGW-w64 will set errno to 0 on success. So if the errno
// was previously set it will falsely indicate and error.
if(old_errno != errno && errno != 0)
message_error(_("Error reading directory entry: %s"),
strerror(errno));
next = false;
}
next_entry = dir_entry->d_name;
#endif
if (next) {
const size_t next_entry_len = strlen(next_entry);
if (*entry_len <= next_entry_len) {
message_error(_("Unexpected directory entry "
"length."));
*entry_len = 0;
return true;
}
// Copy NULL terminator
memcpy(entry, next_entry, next_entry_len + 1);
*entry_len = next_entry_len;
}
return next;
}
extern void
directory_iter_close(directory_iter *iter)
{
if (iter != NULL) {
#ifdef _MSC_VER
if (iter->dir != INVALID_HANDLE_VALUE
&& !FindClose(iter->dir)) {
DWORD err = GetLastError();
message_windows_error("Error closing directory", err);
}
free(iter->path);
#else
if(closedir(iter->dir))
message_error(_("Error closing directory: %s"),
strerror(errno));
#endif
free(iter);
}
}
#endif //defined(_MSC_VER) || defined(HAVE_DIRENT_H)

View File

@ -26,6 +26,9 @@
# define stat _stat64 # define stat _stat64
# define fstat _fstat64 # define fstat _fstat64
# define off_t __int64 # define off_t __int64
# define PATH_SEP '\\'
#else
# define PATH_SEP '/'
#endif #endif
@ -70,6 +73,12 @@ typedef struct {
/// a sparse file. /// a sparse file.
bool dest_try_sparse; bool dest_try_sparse;
#if defined(_MSC_VER) || defined(HAVE_DIRENT_H)
/// If true, this entry is a directory, not a file. This can only
/// be set if the --recursive option is used.
bool is_directory;
#endif
/// This is used only if dest_try_sparse is true. This holds the /// This is used only if dest_try_sparse is true. This holds the
/// number of zero bytes we haven't written out, because we plan /// number of zero bytes we haven't written out, because we plan
/// to make that byte range a sparse chunk. /// to make that byte range a sparse chunk.
@ -187,3 +196,44 @@ extern bool io_pread(file_pair *pair, io_buf *buf, size_t size, uint64_t pos);
/// \return On success, zero is returned. On error, -1 is returned /// \return On success, zero is returned. On error, -1 is returned
/// and error message printed. /// and error message printed.
extern bool io_write(file_pair *pair, const io_buf *buf, size_t size); extern bool io_write(file_pair *pair, const io_buf *buf, size_t size);
/// Opaque struct representing a directory iterator. This should be used
/// with directory_iterator_init(), directory_iter_next(), and
/// directory_iter_close().
typedef struct directory_iter_s directory_iter;
/// @brief Creates a Directory Iterator
///
/// This will create and initialize a directory_iter structure.
/// The pointer should not be freed and should instead be passed
/// to directory_iter_close() when it is no longer needed.
///
/// @param path String path to a directory
///
/// @return On success, a pointer to the directory iterator.
/// On error, NULL.
extern directory_iter * directory_iterator_init(const char* path);
/// @brief Iterate to the next directory entry
///
/// @param iter Pointer to the iterator
/// @param entry Buffer to receive the next directory entry
/// @param entry_len Set this to the size of the entry buffer. On
/// success this is set to the string length of
/// the entry that was copied into entry (does not
/// count the NULL terminator).
///
/// @return Returns true if there may be more entries.
/// Returns false otherwise.
extern bool directory_iter_next(directory_iter *iter, char *entry,
size_t *entry_len);
/// @brief Close the Directory Iterator
///
/// The cleans up the iterator by closing files and freeing
/// all needed memory.
///
/// @param iter Pointer to the iterator to close
extern void directory_iter_close(directory_iter *iter);

View File

@ -1274,30 +1274,10 @@ list_totals(void)
extern void extern void
list_file(const char *filename) list_file(file_pair *pair)
{ {
if (opt_format != FORMAT_XZ && opt_format != FORMAT_AUTO)
message_fatal(_("--list works only on .xz files "
"(--format=xz or --format=auto)"));
message_filename(filename);
if (filename == stdin_filename) {
message_error(_("--list does not support reading from "
"standard input"));
return;
}
init_field_widths(); init_field_widths();
// Unset opt_stdout so that io_open_src() won't accept special files.
// Set opt_force so that io_open_src() will follow symlinks.
opt_stdout = false;
opt_force = true;
file_pair *pair = io_open_src(filename);
if (pair == NULL)
return;
xz_file_info xfi = XZ_FILE_INFO_INIT; xz_file_info xfi = XZ_FILE_INFO_INIT;
if (!parse_indexes(&xfi, pair)) { if (!parse_indexes(&xfi, pair)) {
bool fail; bool fail;

View File

@ -11,7 +11,7 @@
/////////////////////////////////////////////////////////////////////////////// ///////////////////////////////////////////////////////////////////////////////
/// \brief List information about the given .xz file /// \brief List information about the given .xz file
extern void list_file(const char *filename); extern void list_file(file_pair *pair);
/// \brief Show the totals after all files have been listed /// \brief Show the totals after all files have been listed

View File

@ -19,6 +19,24 @@
# include <sys/prctl.h> # include <sys/prctl.h>
#endif #endif
/// The directory_list type is used in recursive mode to keep track of all
/// the directories that need processing. Its used a a queue to process
/// directories in the order they are discovered. Files, on the other hand
/// are processed right away to reduce the size of the queue and hence the
/// amount of memory needed to be allocated at any one time.
typedef struct directory_list_s {
/// Path to the directory. This is used as a pointer since it is
/// likely that most directories do not need the full possible file
/// path length allowed by systems. This saves memory in cases where
/// many directories need to be on the queue at the same time.
char *dir_path;
/// Pointer to the next directory in the queue. This is only a
/// singly linked list since we only ever need to process the queue
/// in one direction.
struct directory_list_s *next;
} directory_list;
/// Exit status to use. This can be changed with set_exit_status(). /// Exit status to use. This can be changed with set_exit_status().
static enum exit_status_type exit_status = E_SUCCESS; static enum exit_status_type exit_status = E_SUCCESS;
@ -146,6 +164,191 @@ read_name(const args_info *args)
} }
static void
process_entry(const char *path)
{
#ifdef HAVE_DECODERS
if (opt_mode == MODE_LIST && path == stdin_filename) {
message_error(_("--list does not support reading from "
"standard input"));
return;
}
#endif
// Open the entry
file_pair *pair = io_open_src(path);
if (pair == NULL)
return;
#if defined(_MSC_VER) || defined(HAVE_DIRENT_H)
// io_open_src() will return NULL if the path points to a directory
// and we aren't in recursive mode. So there is no need to check
// for recursive mode here.
if (pair->is_directory) {
// Create the queue of directories to process. The first
// item in the queue will be the base entry. The first item
// is dynamically allocated to simplify the memory freeing
// code later on.
directory_list *dir_list = xmalloc(sizeof(directory_list));
dir_list->dir_path = xstrdup(path);
// Strip any trailing path separators at the end of the
// directory. This makes the path compatible with Windows
// MSVC search functions and makes the output look nicer.
for (size_t i = strlen(path) - 1; dir_list->dir_path[i]
== PATH_SEP && i > 1; i--) {
dir_list->dir_path[i] = '\0';
}
dir_list->next = NULL;
// The current pointer represents the directory we are
// currently processing. To start, it is initialzed as the
// base entry.
directory_list *current = dir_list;
// The pointer to the last item in the queue is used to
// append new directories.
directory_list *last = dir_list;
do {
directory_list* next;
// The iterator initialization will return NULL and
// print an error message if there is any kind of
// problem. In this case, we can simply continue on
// to the next directory to process.
directory_iter *iter = directory_iterator_init(
current->dir_path);
// The error message is printed during
// directory_iterator_init(), so no need to print
// anything before proceeding to the next iteration.
if (iter == NULL)
goto next_iteration;
const size_t dir_path_len = strlen(current->dir_path);
// Set ENTRY_LEN_MAX depending on the system. On
// POSIX systems, NAME_MAX will be defined in
// <limit.h>. On Windows, the directory parsing
// functions have buffers of size MAX_PATH.
#ifdef TUKLIB_DOSLIKE
# define ENTRY_LEN_MAX MAX_PATH
#else
# define ENTRY_LEN_MAX NAME_MAX
#endif
char entry[ENTRY_LEN_MAX + 1];
size_t entry_len;
// The entry_len must be reset each iteration because
// directory_iter_next() will only write to the entry
// buffer if it can write the entire entry name. If the
// value is not reset each time, it will limit the
// next entry size based on the last entry's size.
while ((entry_len = ENTRY_LEN_MAX)
&& directory_iter_next(iter, entry,
&entry_len)) {
// Extend current directory path with
// new entry.
if (entry_len == 0)
continue;
// Check for '.' and '..' since there is no
// point in processing them.
if (entry[0] == '.' && ((entry[1] == '.'
&& entry[2] == '\0')
|| entry[1] == '\0'))
continue;
// The total entry size needs the "+2" to
// make room for the directory path separator
// and the NULL terminator.
const size_t total_size = entry_len + dir_path_len + 2;
char *entry_path = xmalloc(total_size);
memcpy(entry_path, current->dir_path, dir_path_len);
char *entry_copy_start = entry_path + dir_path_len;
entry_path[dir_path_len] = PATH_SEP;
entry_copy_start++;
memcpy(entry_copy_start, entry, entry_len + 1);
// Try to open the next entry. If it is a file
// it will be processed immediately. If it is a
// directory it will be added to the queue to
// be processed later. Processing files right
// away reduces the amount of memory needed
// for queue nodes and stored file paths.
// Exploring directories only increases the
// amount of memory needed so its better to
// prioritize processing files as early as
// possible.
pair = io_open_src(entry_path);
if (pair == NULL) {
free(entry_path);
continue;
}
if (pair->is_directory) {
directory_list *next_dir = xmalloc(
sizeof(directory_list));
next_dir->dir_path = entry_path;
next_dir->next = NULL;
last->next = next_dir;
last = next_dir;
} else if (entry[0] == '.'
&& opt_mode == MODE_COMPRESS
&& !opt_keep_original) {
message_warning(_("%s: Hidden file "
"skipped during recursive "
"compression mode. Use --keep "
"to process these files.\n"),
entry_path);
free(entry_path);
} else {
message_filename(entry_path);
#ifdef HAVE_DECODERS
if (opt_mode == MODE_LIST)
list_file(pair);
else
#endif
coder_run(pair);
free(entry_path);
}
}
directory_iter_close(iter);
next_iteration:
next = current->next;
free(current->dir_path);
free(current);
current = next;
} while (current != NULL);
return;
}
#endif // defined(_MSC_VER) || defined(HAVE_DIRENT_H)
// Set and possibly print the filename for the progress message.
message_filename(path);
#ifdef HAVE_DECODERS
if (opt_mode == MODE_LIST)
list_file(pair);
else
#endif
coder_run(pair);
}
int int
main(int argc, char **argv) main(int argc, char **argv)
{ {
@ -209,7 +412,7 @@ main(int argc, char **argv)
// Tell the message handling code how many input files there are if // Tell the message handling code how many input files there are if
// we know it. This way the progress indicator can show it. // we know it. This way the progress indicator can show it.
if (args.files_name != NULL) if (args.files_name != NULL || opt_recursive)
message_set_files(0); message_set_files(0);
else else
message_set_files(args.arg_count); message_set_files(args.arg_count);
@ -252,18 +455,11 @@ main(int argc, char **argv)
// TODO: Make sandboxing work for other situations too. // TODO: Make sandboxing work for other situations too.
if (args.files_name == NULL && args.arg_count == 1 if (args.files_name == NULL && args.arg_count == 1
&& (opt_stdout || strcmp("-", args.arg_names[0]) == 0 && (opt_stdout || strcmp("-", args.arg_names[0]) == 0
|| opt_mode == MODE_LIST)) || opt_mode == MODE_LIST)
&& !opt_recursive)
io_allow_sandbox(); io_allow_sandbox();
#endif #endif
// coder_run() handles compression, decompression, and testing.
// list_file() is for --list.
void (*run)(const char *filename) = &coder_run;
#ifdef HAVE_DECODERS
if (opt_mode == MODE_LIST)
run = &list_file;
#endif
// Process the files given on the command line. Note that if no names // Process the files given on the command line. Note that if no names
// were given, args_parse() gave us a fake "-" filename. // were given, args_parse() gave us a fake "-" filename.
for (unsigned i = 0; i < args.arg_count && !user_abort; ++i) { for (unsigned i = 0; i < args.arg_count && !user_abort; ++i) {
@ -298,7 +494,7 @@ main(int argc, char **argv)
} }
// Do the actual compression or decompression. // Do the actual compression or decompression.
run(args.arg_names[i]); process_entry(args.arg_names[i]);
} }
// If --files or --files0 was used, process the filenames from the // If --files or --files0 was used, process the filenames from the
@ -314,7 +510,7 @@ main(int argc, char **argv)
// read_name() doesn't return empty names. // read_name() doesn't return empty names.
assert(name[0] != '\0'); assert(name[0] != '\0');
run(name); process_entry(name);
} }
if (args.files_name != stdin_filename) if (args.files_name != stdin_filename)

View File

@ -806,6 +806,28 @@ message_signal_handler(void)
} }
#ifdef _MSC_VER
extern void
message_windows_error(const char* message, DWORD error_code)
{
char *error_message;
if (FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER
| FORMAT_MESSAGE_FROM_SYSTEM
| FORMAT_MESSAGE_IGNORE_INSERTS,
NULL, error_code, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
(LPTSTR)&error_message, 0, NULL)) {
message_error("%s: %s", message, error_message);
}
else {
message_error("%s\n", message);
}
LocalFree(error_message);
}
#endif
extern const char * extern const char *
message_strm(lzma_ret code) message_strm(lzma_ret code)
{ {
@ -995,12 +1017,18 @@ message_help(bool long_help)
" ignore possible remaining input data")); " ignore possible remaining input data"));
puts(_( puts(_(
" --no-sparse do not create sparse files when decompressing\n" " --no-sparse do not create sparse files when decompressing\n"
" -S, --suffix=.SUF use the suffix '.SUF' on compressed files\n" " -S, --suffix=.SUF use the suffix '.SUF' on compressed files"));
}
puts(_(
" -r, --recursive operate recursively on directories"));
if (long_help)
puts(_(
" --files[=FILE] read filenames to process from FILE; if FILE is\n" " --files[=FILE] read filenames to process from FILE; if FILE is\n"
" omitted, filenames are read from the standard input;\n" " omitted, filenames are read from the standard input;\n"
" filenames must be terminated with the newline character\n" " filenames must be terminated with the newline character\n"
" --files0[=FILE] like --files but use the null character as terminator")); " --files0[=FILE] like --files but use the null character as terminator"));
}
if (long_help) { if (long_help) {
puts(_("\n Basic file format and compression options:\n")); puts(_("\n Basic file format and compression options:\n"));

View File

@ -84,6 +84,20 @@ tuklib_attr_noreturn
extern void message_signal_handler(void); extern void message_signal_handler(void);
#ifdef _MSC_VER
/// \brief Print an error message using a Windows specific error code
///
/// The function uses message_error() internally, so it will set the
/// exit code to 1 after printing.
///
/// \param message Message describing where the error occurred
/// \param error_code Error number from GetLastError()
extern void
message_windows_error(const char* message, DWORD error_code);
#endif
/// Convert lzma_ret to a string. /// Convert lzma_ret to a string.
extern const char *message_strm(lzma_ret code); extern const char *message_strm(lzma_ret code);

View File

@ -70,11 +70,11 @@
#include "main.h" #include "main.h"
#include "mytime.h" #include "mytime.h"
#include "file_io.h"
#include "coder.h" #include "coder.h"
#include "message.h" #include "message.h"
#include "args.h" #include "args.h"
#include "hardware.h" #include "hardware.h"
#include "file_io.h"
#include "options.h" #include "options.h"
#include "signals.h" #include "signals.h"
#include "suffix.h" #include "suffix.h"

View File

@ -6,7 +6,7 @@
.\" This file has been put into the public domain. .\" This file has been put into the public domain.
.\" You can do whatever you want with this file. .\" You can do whatever you want with this file.
.\" .\"
.TH XZ 1 "2024-01-23" "Tukaani" "XZ Utils" .TH XZ 1 "2024-01-29" "Tukaani" "XZ Utils"
. .
.SH NAME .SH NAME
xz, unxz, xzcat, lzma, unlzma, lzcat \- Compress or decompress .xz and .lzma files xz, unxz, xzcat, lzma, unlzma, lzcat \- Compress or decompress .xz and .lzma files
@ -205,6 +205,16 @@ This has only limited use since when standard error
is a terminal, using is a terminal, using
.B \-\-verbose .B \-\-verbose
will display an automatically updating progress indicator. will display an automatically updating progress indicator.
.PP
Recursive mode (
.B \-\-recursive
) allows directories to be processed. If a directory or any
sub-directory is a symlink, it will be skipped.
When compressing, files discovered in sub-directories that begin
with '.' are skipped unless
.B \-\-keep
was specified.
. .
.SS "Memory usage" .SS "Memory usage"
The memory usage of The memory usage of
@ -529,6 +539,18 @@ the suffix must always be specified unless
writing to standard output, writing to standard output,
because there is no default suffix for raw streams. because there is no default suffix for raw streams.
.TP .TP
.BR \-r ", " \-\-recursive
Operate recursively on directories.
If a directory is specified as a target file in any operation
mode, then it is traversed recursively into all files the
directory and any sub-directories contain.
When traversing recursively in compression mode all filenames
and directories beginning with
.B .
are skipped, unless
.BR \-\-keep
is used.
.TP
\fB\-\-files\fR[\fB=\fIfile\fR] \fB\-\-files\fR[\fB=\fIfile\fR]
Read the filenames to process from Read the filenames to process from
.IR file ; .IR file ;
@ -542,6 +564,10 @@ is taken as a regular filename; it doesn't mean standard input.
If filenames are given also as command line arguments, they are If filenames are given also as command line arguments, they are
processed before the filenames read from processed before the filenames read from
.IR file . .IR file .
In
.BR \-\-recursive
mode directories can be used and are
traversed recursively.
.TP .TP
\fB\-\-files0\fR[\fB=\fIfile\fR] \fB\-\-files0\fR[\fB=\fIfile\fR]
This is identical to \fB\-\-files\fR[\fB=\fIfile\fR] except This is identical to \fB\-\-files\fR[\fB=\fIfile\fR] except

View File

@ -11,6 +11,7 @@ EXTRA_DIST = \
tuktest.h \ tuktest.h \
tests.h \ tests.h \
test_files.sh \ test_files.sh \
test_recursive.sh \
test_compress.sh \ test_compress.sh \
test_compress_prepared_bcj_sparc \ test_compress_prepared_bcj_sparc \
test_compress_prepared_bcj_x86 \ test_compress_prepared_bcj_x86 \
@ -62,6 +63,7 @@ TESTS = \
test_lzip_decoder \ test_lzip_decoder \
test_vli \ test_vli \
test_files.sh \ test_files.sh \
test_recursive.sh \
test_suffix.sh \ test_suffix.sh \
test_compress_prepared_bcj_sparc \ test_compress_prepared_bcj_sparc \
test_compress_prepared_bcj_x86 \ test_compress_prepared_bcj_x86 \

87
tests/test_recursive.sh Executable file
View File

@ -0,0 +1,87 @@
#!/bin/sh
###############################################################################
#
# Author: Jia Tan
#
# This file has been put into the public domain.
# You can do whatever you want with this file.
#
###############################################################################
# If xz wasn't built, this test is skipped.
XZ="../src/xz/xz"
if test -x "$XZ" ; then
:
else
echo "xz was not built, skipping this test $XZ"
exit 77
fi
# If decompression support is missing, this test is skipped.
if grep 'define HAVE_DECODERS' ../config.h > /dev/null ; then
:
else
echo "Decompression support is disabled, skipping this test."
exit 77
fi
# Setup nested directory structure, but first
# delete it first if it already exists
rm -rf xz_recursive_level_one
mkdir -p xz_recursive_level_one/level_two/level_three
FILES="file_one "\
"file_two "\
"level_two/file_one "\
"level_two/file_two "\
"level_two/level_three/file_one "\
"level_two/level_three/file_two "
for FILE in $FILES
do
cp "$srcdir/files/good-0-empty.xz" "xz_recursive_level_one/$FILE.xz"
done
# Decompress with -r, --recursive option
if "$XZ" -dr xz_recursive_level_one; then
:
else
echo "Recursive decompression failed: $*"
exit 1
fi
# Verify the files were all decompressed.
for FILE in $FILES
do
if test -e "xz_recursive_level_one/$FILE"; then
:
else
echo "File not decompressed: xz_recursive_level_one/$FILE.xz"
exit 1
fi
# Remove decompressed files to prevent warnings in symlink test.
rm "xz_recursive_level_one/$FILE"
done
# Create a symlink to a directory to create a loop in the file system
# to test that xz will not have infinite recursion. Creating the symlink
# may fail, for instance on MSYS2 where the default behavior is to create
# a copy of the target instead of an actual symlink.
if ln -s ../ xz_recursive_level_one/level_two/loop_link; then
# The symlink should cause a warning and skip that directory.
"$XZ" -drf xz_recursive_level_one
if test $? != 2 ; then
echo "Recursive decompression did not give warning with symlink: $*"
exit 1
fi
else
echo "Symlink could not be created, skipping this test"
exit 77
fi
# Clean up nested directory
rm -rf xz_recursive_level_one