读取使用 gzip 保存的文件：为什么文件句柄是一个字符串？

Question

我有一个列表列表，假设它是：

sentcorpus = [["hello", "how", "are", "you", "?"], ["hello", "I", "'m", "fine"]]

我想以gzip格式保存：

import gzip
import json
with gzip.open('corpus.json.gz', 'wb') as fileh:
    fileh.write(json.dumps(sentcorpus).encode("utf8"))

那么这样读回去就顺理成章了：

with gzip.open('wbec_corpus.json.gz', 'rb') as fileh:
    sentcorpus = json.load(fileh.read().decode("utf8"))

但是没有：

AttributeError: 'str' object has no attribute 'read'

相反，这个有效：

with gzip.open('wbec_corpus.json.gz', 'rb') as fileh:
    sentcorpus = json.load(fileh)

为什么 fileh 是字符串而不是文件句柄？

Answer 1

这不是文件对象，JSON 库正在抛出错误。要理解我们需要看 json.load 和 JSON.loads

json.load(fp, **)

Deserialize fp (a .read()-supporting text file or binary file containing a JSON document) to a Python object using this conversion table.

json.loads(s, )

Deserialize s (a str, bytes or bytearray instance containing a JSON document) to a Python object using this conversion table.

简而言之JSON.load不需要.read对象它需要一个文件指针；但是 JSON.loads 确实需要字符串或 file.read() .

所以下面这两行都可以工作

sentcorpus = json.loads(fileh.read().decode("utf8"))
sentcorpus = json.load(fileh)

Read file saved using gzip: why is the file handle a string?