从 url 解析时 Urllib 请求抛出解码错误

Question

我正在尝试从 url 解析 json 格式的数据：http://ws-old.parlament.ch/sessions?format=json。我的浏览器可以很好地处理 json 数据。但是请求总是抛出以下错误：

JSONDecodeError：预期值：第 3 行第 1 列（字符 4）

我正在使用 Python 3.5。这是我的代码：

import json
import urllib.request

connection = urllib.request.urlopen('http://ws-old.parlament.ch/affairs/20080062?format=json')

js = connection.read()

info = json.loads(js.decode("utf-8"))
print(info)

Answer 1

该站点使用 User-Agent 过滤来仅向已知浏览器提供 JS。幸运的是它很容易被愚弄，只需将 User-Agent header 设置为 Mozilla:

request = urllib.request.Request(
    'http://ws-old.parlament.ch/affairs/20080062?format=json',
    headers={'User-Agent': 'Mozilla'})

connection = urllib.request.urlopen(request)
js = connection.read()

info = json.loads(js.decode("utf-8"))
print(info)

从 url 解析时 Urllib 请求抛出解码错误

Urllib request throws a decode error when parsing from url

python

json

decode

urllib