为什么 UnZip 提取最后一个串联的 ZIP？

Question

我发现以下行为出乎意料：

$ mkdir tmp && cd tmp/
$ for example in a b c ; do echo $example > $example.txt ; done
$ for file in `ls *` ; do zip $file.zip $file ; done
$ cat a.txt.zip b.txt.zip c.txt.zip > concatenated.zip
$ unzip concatenated.zip -d output
$ ls output/
c.txt                                     # unexpected

另一方面，p7zip 是这样做的：

$ rm -r output/
$ 7z x concatenated.zip -ooutput/
$ ls output/
a.txt

为什么 UnZip 会提取最后一个串联的 ZIP？它是否从EOF向后遍历直到找到PK文件签名？

Answer 1

Does it traverse backwards from EOF until it finds the PK file signature?

是的。这是 unzip 将执行的操作：

在 zip 文件的末尾寻找 "end of central directory record" (EOCD)
阅读记录并关注"offset of start of central directory"
读取中央目录（它包含存档中每个条目的列表）
阅读每个条目并按照 "relative offset of local header"
用数据读取本地header并解压

在你的情况下，你只会找到最后一个偏移量错误的 EOCD（你添加了字节）。这就是为什么 unzip 告诉你：

warning [concatenated.zip]:  324 extra bytes at beginning or within zipfile
  (attempting to process anyway)

它找到 c.txt.zip 的中央目录，只看到一个条目 (c.txt)，只提取一个文件。

考虑到 zip 文件的结构，我认为这是合乎逻辑的做法。自解压 zip 文件使用此：文件以二进制文件开头以提取自身并以实际 zip 内容结束（参见 unzipsfx 和 zip -A）。

如果文件 不像 zip 文件那样 开始，7z 似乎会从头开始尝试:

# not a.txt.zip, but a.txt
$ cat a.txt b.txt.zip c.txt.zip > prepended.zip
# fix offset
$ zip -A prepended.zip

$ unzip -l prepended.zip 
Archive:  prepended.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        2  2016-11-22 20:29   c.txt
---------                     -------
        2                     1 file


$ 7z l prepended.zip 
[...]
Path = prepended.zip
Warning: The archive is open with offset
Type = zip
Physical Size = 326
Embedded Stub Size = 164

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2016-11-22 20:29:05 .....            2            2  c.txt
------------------- ----- ------------ ------------  ------------------------
2016-11-22 20:29:05                  2            2  1 files

注意 zip -A 以修复偏移量：

The -A option tells zip to adjust the entry offsets stored in the archive to take into account this "preamble" data.

我不知道你想要达到什么目的，但连接 zip 文件可能不是最简单的方法（将它们解压回来并不容易）。

为什么 UnZip 提取最后一个串联的 ZIP？

Why does UnZip extract the last concatenated ZIP?

unix

linux

unzip