如何使用 python 请求抓取非 restful API?

How to scrape non restful API using python requests?

我正在尝试抓取这个 website. I found out from the Network tab of Google Developer Tools that a request named hospitals to the URL https://tncovidbeds.tnega.org/api/hospitals 有我需要的回复。

我尝试在我的 python 代码中使用相同的 headers 和有效载荷重新创建相同的情况,但得到的响应与网站的不同。

这是我的 python 代码:

import requests

url = r'https://tncovidbeds.tnega.org/api/hospitals'

d = {
"searchString":"",
"sortCondition":{"Name":1},
"pageNumber":1,
"pageLimit":10,
"SortValue":"Availability",
"Districts":["5ea0abd3d43ec2250a483a4f"],
"BrowserId":"b4c5b065a84c7d2b60e8b23d415b2c3a",
"IsGovernmentHospital":"true",
"IsPrivateHospital":"true",
"FacilityTypes":["CHO","CHC","CCC"]
}

h = {
"authority": "tncovidbeds.tnega.org",
"method": "POST",
"path":"/api/hospitals",
"scheme": "https",
"accept": "application/json, text/plain, */*",
"accept-encoding": "gzip, deflate, br",
"accept-language": "en-US,en;q=0.9",
"cache-control": "no-cache",
"content-length": "280",
"content-type": "application/json;charset=UTF-8",
"cookie": "_ga=GA1.2.1066740172.1620653373; _gid=GA1.2.1460220464.1620653373",
"origin": "https://tncovidbeds.tnega.org",
"pragma": "no-cache",
"sec-ch-ua": '" Not A;Brand";v="99", "Chromium";v="90", "Google Chrome";v="90"',
"sec-ch-ua-mobile": "?0",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
"token": "null",
}

res = requests.post(url, data=d, headers=h)
print(res.json())

我得到的输出是:

{
'result': None,
 'exception': '',
 'pagination': None,
 'statusCode': '500',
 'errors': [],
 'warnings': []
}

我需要的响应和来自 Google 网络选项卡的响应是:

{
"result": A BIG LIST OF JSON OBJECTS,
"exception":null,
"pagination":{"pageNumber":1,"pageLimit":10,"skipCount":0,"totalCount":155},
"statusCode":"200",
"errors":[],
"warnings":[]}

你能给我一个解决方案吗?

提前致谢。

正如我从您的浏览器请求中看到的,content-type 必须是 application/json;charset=UTF-8。当作为 data 参数请求 will create 传递有效载荷时,会发出 application/x-www-form-urlencoded 请求。要解决此问题,您需要将有效负载作为 json 参数传递。它会自动设置正确的content-type

requests.post(url, json=d)

此外,在您的情况下,您无需提供任何额外的 headers 即可使请求生效。