如何在最短时间内读取 1TB 压缩文件

Question

我正在尝试读取压缩文件。我正在使用命令 tar tf abc.tar.xz 执行此操作。因为文件的大小是 1TB，所以需要很多时间。我不太熟悉 bash 脚本。我还有其他命令，例如 zcat 3532642.tar.gz | more 和 tar tf 3532642.tar.xz |grep --regex="folder1/folder2/folder3/folder4/" 以及

tar tvf 3532642.tar.xz --to-command \
'grep --label="$TAR_FILENAME" -H folder1/folder2/folder3/folder4/ ; true'

但我发现它们在执行文件以读取其内容所花费的时间方面没有太大差异。

有谁知道我怎样才能在最短的时间内处理如此大量的压缩文件数据。任何帮助将不胜感激！！！

Answer 1

As rrauenza mentions, since pigz may not work for the xz format, there is a similar tool pixz 用于并行索引 xz compressing/decompressing。

从 man page 可以明显看出 Pigz compresses/decommpresses 使用线程来利用多个处理器和内核。

与pigz类似，此命令还提供了一个选项来指定可以在多核中并行调用的线程数以实现最佳性能。

-p --processes n
Allow up to n processes (default is the number of online processors)

或者您可以通过bash命令getconf _NPROCESSORS_ONLN手动获取内核数，并将值设置为-p。

来自 pixz 的 GitHub 页面的更多详细信息以及如何下载和安装的详细信息

(或)

使用仅 tar 的解决方案，只有在文件名事先知道的情况下才能完成

tar -zxOf <file-name_inside-tar> <file-containing-tar>

选项如下：-

   -f, --file=ARCHIVE
          use archive file or device ARCHIV

   -z, --gzip
          filter the archive through gzip

   -x, --extract, --get
          extract files from an archive

   -O, --to-stdout
          extract files to standard output

可能不如 pigz 有效，但仍然可以完成工作。

如何在最短时间内读取 1TB 压缩文件

How to read 1TB zipped file in minimum time

bash

grep

tar

zcat