在 json 中使用块，请求将大数据放入 python

Question

我正在尝试使用 API 将大量数据导入 python。但我无法获得全部数据。该请求只允许检索前 1000 行。

r = requests.get("https://data.cityofchicago.org/resource/6zsd-86xi.json")

json=r.json()
df=pd.DataFrame(json)
df.drop(df.columns[[0,1,2,3,4,5,6,7]], axis=1, inplace=True) #dropping some columns
df.shape

输出是

(1000,22)

该网站包含近 600 万个数据点。然而，只有 1000 个被检索到。我该如何解决这个问题？分块是正确的选择吗？有人可以帮我处理代码吗？

谢谢。

Answer 1

您需要对结果进行分页以获得整个数据集。大多数 API 都会限制单个请求中返回的结果数量。根据 Socrata docs 你需要添加 $limit 和 $offset 参数到请求 url.

例如，对于结果的第一页，您可以从 - https://data.cityofchicago.org/resource/6zsd-86xi.json?$limit=1000&$offset=0

然后对于下一页，您只需增加偏移量 - https://data.cityofchicago.org/resource/6zsd-86xi.json?$limit=1000&$offset=1000

继续递增，直到拥有整个数据集。

在 json 中使用块，请求将大数据放入 python

Using chunk in json, requests to get large data into python

python

api

json

chunking

python-requests