为什么我无法在 python 中加载此 json 文件？

Question

解决方案，如果有人在谷歌搜索时发现它：
问题不在于代码本身，而在于 Firefox 上的下载。显然（参见 https://bugzilla.mozilla.org/show_bug.cgi?id=1470011）某些服务器会对文件进行两次 gzip 压缩。下载的文件应命名为 file.json.gz.gz，但缺少一个 .gz。需要提取两次才能得到内容

我正在尝试整理此文件中的一些信息：https://dl.vndb.org/dump/vndb-tags-latest.json.gz 我对使用 json 也很陌生，但我找不到任何对我有帮助的东西。

问题是我无法将其加载到 python。使用 7zip 提取 .gz 文件并尝试加载文件 json.load(open('vndb-tags-2020-12-31.json', encoding='utf-8')) returns 错误

>>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte.

没有 utf-8 参数我得到

>>> UnicodeDecodeError: 'cp932' codec can't decode byte 0x8b in position 1: illegal multibyte sequence

相反。当我尝试使用 gzip 包

随时随地解密文件时，我运行遇到了同样的问题

import gzip
with gzip.open('vndb-tags-2020-12-31.json.gz') as fd:
    json.load(fd)
>>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

我想我需要一个不同的编码选项，但是 utf-16 和 32 不起作用，而且我在帮助页面上找不到任何内容 https://vndb.org/d14

Answer 1

您可以 json 逐个下载、提取和加载您的数据；

向目标发送请求 url 并将数据捕获为字节对象
使用io模块加载并保留其数据到内存单元。
将 io 对象传递给 gzip 函数并将其提取到 json 数据
将 json 字符串传递给转储属性并将其保留为 python 字典

试试这个：

import requests, io, gzip, json


url = 'https://dl.vndb.org/dump/vndb-tags-latest.json.gz'
file_object = io.BytesIO(requests.get(url).content)

with gzip.open(file_object, 'r') as gzip_file:
    reserve_data = gzip_file.read()

load_json = json.loads(reserve_data)
beautiful_json = json.dumps(load_json, sort_keys=True, indent=4)
print(beautiful_json)

对于较大的文件，最好将 gzip 保存在磁盘上，然后从磁盘加载它：

import requests, gzip, json

target_url = 'https://dl.vndb.org/dump/vndb-tags-latest.json.gz'
downloaded_gzip_file = requests.get(target_url).content

with open("my_json_file.gz", "wb") as gz_file:
    gz_file.write(downloaded_gzip_file)

with gzip.open("my_json_file.gz") as gz_file:
    load_json_data = json.load(gz_file)

beautiful_json = json.dumps(load_json_data, sort_keys=True, indent=4)
print(beautiful_json)

为什么我无法在 python 中加载此 json 文件？

Why can't I load this json file in python?

python

json

character-encoding