为什么某些 zip 文件的文件内容未知
why do certain zip files have unknown file content
背景
我偶然发现了这个问题
分析
根据 ZipEntry 的 java docs,有时只需 returns -1
即可请求 zip 文件条目的大小
但是,运行 命令
$ unzip -l b17c024e-89f1-42f7-a546-91d46610cedb.epub
Archive: b17c024e-89f1-42f7-a546-91d46610cedb.epub
Length Date Time Name
-------- ---- ---- ----
20 01-27-12 11:17 mimetype
2378 04-20-12 10:12 OEBPS/hayat-ghayr.html
6436 02-06-12 11:06 OEBPS/content.opf
112579 01-27-12 11:25 OEBPS/images/978-614-425-313-7-hayat-ghayr-cover.png
182575 01-27-12 11:25 OEBPS/images/978-614-425-313-7-hayat_fmt.png
7757 01-27-12 11:21 OEBPS/template.css
5643 01-27-12 11:18 OEBPS/hayat-ghayr-2.html
20144 01-27-12 11:17 OEBPS/hayat-ghayr-1.html
65543 01-27-12 11:17 OEBPS/hayat-ghayr-3.html
59434 01-27-12 11:17 OEBPS/hayat-ghayr-4.html
66768 01-27-12 11:17 OEBPS/hayat-ghayr-5.html
49117 01-27-12 11:17 OEBPS/hayat-ghayr-6.html
65346 01-27-12 11:17 OEBPS/hayat-ghayr-7.html
74196 01-27-12 11:17 OEBPS/hayat-ghayr-8.html
73998 01-27-12 11:17 OEBPS/hayat-ghayr-9.html
61031 01-27-12 11:17 OEBPS/hayat-ghayr-10.html
68297 01-27-12 11:17 OEBPS/hayat-ghayr-11.html
72084 01-27-12 11:17 OEBPS/hayat-ghayr-12.html
2386 01-27-12 11:17 OEBPS/hayat-ghayr-13.html
61132 01-27-12 11:17 OEBPS/hayat-ghayr-14.html
46320 01-27-12 11:17 OEBPS/hayat-ghayr-15.html
32673 01-27-12 11:17 OEBPS/hayat-ghayr-16.html
88584 01-27-12 11:17 OEBPS/hayat-ghayr-17.html
56474 01-27-12 11:17 OEBPS/hayat-ghayr-18.html
52840 01-27-12 11:17 OEBPS/hayat-ghayr-19.html
80022 01-27-12 11:17 OEBPS/hayat-ghayr-20.html
50781 01-27-12 11:17 OEBPS/hayat-ghayr-21.html
2765 01-27-12 11:17 OEBPS/hayat-ghayr-22.html
265 01-27-12 11:17 META-INF/container.xml
54942 01-27-12 11:17 OEBPS/images/277.png
5549 01-27-12 11:17 OEBPS/toc.ncx
1072 03-23-12 13:28 iTunesMetadata.plist
-------- -------
1529151 32 files
表示所有章节都有内容长度..
而且,如果我们解压同一个文件并用更强的压缩再次重新压缩.. zipFile java 命令 returns 适当的内容大小
问题
这是zip库的问题还是原始压缩的问题?我们怎么知道?
跟进问题
见
ZIP 将元数据存储在存档中的几个不同位置("local file header"、"central directory",有时是 "data descriptor")。只有 "local file header" 位于文件内容的前面 - "central directory" 位于存档的末尾。只有 "central directory" 包含全部事实,在 "local file header".
中不指定任何大小是完全有效的
请参阅 https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT 中的第 4.4.8/4.4.9 节,其中讨论了大小字段
If bit 3 of the general purpose bit flag is set,
these fields are set to zero in the local header and the
correct values are put in the data descriptor and
in the central directory.
"data descriptor" 紧跟在条目的压缩内容之后 - 因此在从不可搜索的流读取时在读取条目的实际内容之前不可用。
当使用 ZipArchiveInputStream
时,您会在 "local file header" 被读取后立即获得 ZipEntry
(因为底层流可能无法搜索),因此可能会丢失大小信息. ZipFile
在幕后使用 RandomAccessFile
并且可以阅读 "central directory" - unzip
和朋友也是 - 所以他们知道的比 ZipArchiveInputStream
.
背景
我偶然发现了这个问题
分析
根据 ZipEntry 的 java docs,有时只需 returns -1
即可请求 zip 文件条目的大小但是,运行 命令
$ unzip -l b17c024e-89f1-42f7-a546-91d46610cedb.epub
Archive: b17c024e-89f1-42f7-a546-91d46610cedb.epub
Length Date Time Name
-------- ---- ---- ----
20 01-27-12 11:17 mimetype
2378 04-20-12 10:12 OEBPS/hayat-ghayr.html
6436 02-06-12 11:06 OEBPS/content.opf
112579 01-27-12 11:25 OEBPS/images/978-614-425-313-7-hayat-ghayr-cover.png
182575 01-27-12 11:25 OEBPS/images/978-614-425-313-7-hayat_fmt.png
7757 01-27-12 11:21 OEBPS/template.css
5643 01-27-12 11:18 OEBPS/hayat-ghayr-2.html
20144 01-27-12 11:17 OEBPS/hayat-ghayr-1.html
65543 01-27-12 11:17 OEBPS/hayat-ghayr-3.html
59434 01-27-12 11:17 OEBPS/hayat-ghayr-4.html
66768 01-27-12 11:17 OEBPS/hayat-ghayr-5.html
49117 01-27-12 11:17 OEBPS/hayat-ghayr-6.html
65346 01-27-12 11:17 OEBPS/hayat-ghayr-7.html
74196 01-27-12 11:17 OEBPS/hayat-ghayr-8.html
73998 01-27-12 11:17 OEBPS/hayat-ghayr-9.html
61031 01-27-12 11:17 OEBPS/hayat-ghayr-10.html
68297 01-27-12 11:17 OEBPS/hayat-ghayr-11.html
72084 01-27-12 11:17 OEBPS/hayat-ghayr-12.html
2386 01-27-12 11:17 OEBPS/hayat-ghayr-13.html
61132 01-27-12 11:17 OEBPS/hayat-ghayr-14.html
46320 01-27-12 11:17 OEBPS/hayat-ghayr-15.html
32673 01-27-12 11:17 OEBPS/hayat-ghayr-16.html
88584 01-27-12 11:17 OEBPS/hayat-ghayr-17.html
56474 01-27-12 11:17 OEBPS/hayat-ghayr-18.html
52840 01-27-12 11:17 OEBPS/hayat-ghayr-19.html
80022 01-27-12 11:17 OEBPS/hayat-ghayr-20.html
50781 01-27-12 11:17 OEBPS/hayat-ghayr-21.html
2765 01-27-12 11:17 OEBPS/hayat-ghayr-22.html
265 01-27-12 11:17 META-INF/container.xml
54942 01-27-12 11:17 OEBPS/images/277.png
5549 01-27-12 11:17 OEBPS/toc.ncx
1072 03-23-12 13:28 iTunesMetadata.plist
-------- -------
1529151 32 files
表示所有章节都有内容长度.. 而且,如果我们解压同一个文件并用更强的压缩再次重新压缩.. zipFile java 命令 returns 适当的内容大小
问题
这是zip库的问题还是原始压缩的问题?我们怎么知道?
跟进问题
见
ZIP 将元数据存储在存档中的几个不同位置("local file header"、"central directory",有时是 "data descriptor")。只有 "local file header" 位于文件内容的前面 - "central directory" 位于存档的末尾。只有 "central directory" 包含全部事实,在 "local file header".
中不指定任何大小是完全有效的请参阅 https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT 中的第 4.4.8/4.4.9 节,其中讨论了大小字段
If bit 3 of the general purpose bit flag is set, these fields are set to zero in the local header and the correct values are put in the data descriptor and in the central directory.
"data descriptor" 紧跟在条目的压缩内容之后 - 因此在从不可搜索的流读取时在读取条目的实际内容之前不可用。
当使用 ZipArchiveInputStream
时,您会在 "local file header" 被读取后立即获得 ZipEntry
(因为底层流可能无法搜索),因此可能会丢失大小信息. ZipFile
在幕后使用 RandomAccessFile
并且可以阅读 "central directory" - unzip
和朋友也是 - 所以他们知道的比 ZipArchiveInputStream
.