python gzip.open: zlib.error: Error -3 while decompressing data: too many length or distance symbols
python gzip.open: zlib.error: Error -3 while decompressing data: too many length or distance symbols
我想解压一个巨大的 gz 文件 (wikidata json dump latest-all.json.gz
, 104GB 压缩) python 和 gzip.open
.
暂时还可以正常使用。但是,在读取 3970 万行后,它会产生错误:
zlib.error: Error -3 while decompressing data: too many length or distance symbols
我解压读取的函数是这样的:
import gzip
...
def wikidata(filename):
with gzip.open(filename, mode='rt') as f:
f.read(2) # skip first two bytes: "{\n"
for line in f:
try:
yield json.loads(line.rstrip(',\n'))
except json.decoder.JSONDecodeError:
continue
完整的错误是:
Traceback (most recent call last):
File "parse.py", line 95, in <module>
for line in lines:
File "parse.py", line 21, in wikidata
for line in f:
File "/usr/lib/python3.8/gzip.py", line 305, in read1
return self._buffer.read1(size)
File "/usr/lib/python3.8/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/usr/lib/python3.8/gzip.py", line 487, in read
uncompress = self._decompressor.decompress(buf, size)
zlib.error: Error -3 while decompressing data: too many length or distance symbols
这可能是什么原因?我该如何解决这个问题?
这意味着压缩数据在那一点或之前的一小段距离内损坏。解决问题的唯一方法是用未损坏的 gzip 文件替换输入。
我想解压一个巨大的 gz 文件 (wikidata json dump latest-all.json.gz
, 104GB 压缩) python 和 gzip.open
.
暂时还可以正常使用。但是,在读取 3970 万行后,它会产生错误:
zlib.error: Error -3 while decompressing data: too many length or distance symbols
我解压读取的函数是这样的:
import gzip
...
def wikidata(filename):
with gzip.open(filename, mode='rt') as f:
f.read(2) # skip first two bytes: "{\n"
for line in f:
try:
yield json.loads(line.rstrip(',\n'))
except json.decoder.JSONDecodeError:
continue
完整的错误是:
Traceback (most recent call last):
File "parse.py", line 95, in <module>
for line in lines:
File "parse.py", line 21, in wikidata
for line in f:
File "/usr/lib/python3.8/gzip.py", line 305, in read1
return self._buffer.read1(size)
File "/usr/lib/python3.8/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/usr/lib/python3.8/gzip.py", line 487, in read
uncompress = self._decompressor.decompress(buf, size)
zlib.error: Error -3 while decompressing data: too many length or distance symbols
这可能是什么原因?我该如何解决这个问题?
这意味着压缩数据在那一点或之前的一小段距离内损坏。解决问题的唯一方法是用未损坏的 gzip 文件替换输入。