读取使用 gzip 保存的文件:为什么文件句柄是一个字符串?

Read file saved using gzip: why is the file handle a string?

我有一个列表列表,假设它是:

sentcorpus = [["hello", "how", "are", "you", "?"], ["hello", "I", "'m", "fine"]]

我想以gzip格式保存:

import gzip
import json
with gzip.open('corpus.json.gz', 'wb') as fileh:
    fileh.write(json.dumps(sentcorpus).encode("utf8"))

那么这样读回去就顺理成章了:

with gzip.open('wbec_corpus.json.gz', 'rb') as fileh:
    sentcorpus = json.load(fileh.read().decode("utf8"))

但是没有:

AttributeError: 'str' object has no attribute 'read'

相反,这个有效:

with gzip.open('wbec_corpus.json.gz', 'rb') as fileh:
    sentcorpus = json.load(fileh)

为什么 fileh 是字符串而不是文件句柄?

这不是文件对象,JSON 库正在抛出错误。要理解我们需要看 json.loadJSON.loads

json.load(fp, **)

Deserialize fp (a .read()-supporting text file or binary file containing a JSON document) to a Python object using this conversion table.

json.loads(s, )

Deserialize s (a str, bytes or bytearray instance containing a JSON document) to a Python object using this conversion table.

简而言之JSON.load不需要.read对象它需要一个文件指针;但是 JSON.loads 确实需要字符串或 file.read() .

所以下面这两行都可以工作

sentcorpus = json.loads(fileh.read().decode("utf8"))
sentcorpus = json.load(fileh)