无法为 Stack Exchange 解码 unicode API
Unable to decode unicode for Stack Exchange API
我正在查看 this codegolf problem, and decided to try taking the python solution and use urllib
instead. I modified some sample code 以使用 urllib
操纵 json
:
import urllib.request
import json
res = urllib.request.urlopen('http://api.stackexchange.com/questions?sort=hot&site=codegolf')
res_body = res.read()
j = json.loads(res_body.decode("utf-8"))
这给出:
➜ codegolf python clickbait.py
Traceback (most recent call last):
File "clickbait.py", line 7, in <module>
j = json.loads(res_body.decode("utf-8"))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
如果您转到:http://api.stackexchange.com/questions?sort=hot&site=codegolf 并在 "Headers" 下单击它会显示 charset=utf-8
。为什么 urlopen
会给我这些奇怪的结果?
res_body
已压缩。我不确定解压缩响应是 urllib
默认处理的事情。
如果您解压缩来自 API 服务器的响应,您将拥有您的数据。
import urllib.request
import zlib
import json
with urllib.request.urlopen(
'http://api.stackexchange.com/questions?sort=hot&site=codegolf'
) as res:
decompressed_data = zlib.decompress(res.read(), 16+zlib.MAX_WBITS)
j = json.loads(decompressed_data, encoding='utf-8')
print(j)
我正在查看 this codegolf problem, and decided to try taking the python solution and use urllib
instead. I modified some sample code 以使用 urllib
操纵 json
:
import urllib.request
import json
res = urllib.request.urlopen('http://api.stackexchange.com/questions?sort=hot&site=codegolf')
res_body = res.read()
j = json.loads(res_body.decode("utf-8"))
这给出:
➜ codegolf python clickbait.py
Traceback (most recent call last):
File "clickbait.py", line 7, in <module>
j = json.loads(res_body.decode("utf-8"))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
如果您转到:http://api.stackexchange.com/questions?sort=hot&site=codegolf 并在 "Headers" 下单击它会显示 charset=utf-8
。为什么 urlopen
会给我这些奇怪的结果?
res_body
已压缩。我不确定解压缩响应是 urllib
默认处理的事情。
如果您解压缩来自 API 服务器的响应,您将拥有您的数据。
import urllib.request
import zlib
import json
with urllib.request.urlopen(
'http://api.stackexchange.com/questions?sort=hot&site=codegolf'
) as res:
decompressed_data = zlib.decompress(res.read(), 16+zlib.MAX_WBITS)
j = json.loads(decompressed_data, encoding='utf-8')
print(j)