如何抓取更改 HTML？

Question

我正在学习网络抓取，想知道如果您想要在按下按钮后更改 HTML 如何抓取网站。我可以用 Selenium 做到这一点，但速度很慢。如何处理请求？

EX：我从 https://www.collegeswimming.com/swimmer/356597/ and I want to scrape the table that appears when you click the button on the page “Fastest.” Note that in the HTML source the table does not exist until you press “Fastest,” and once you press “Fastest” the url is unchanged and still “https://www.collegeswimming.com/swimmer/356597/ 开始。

我使用了检查元素，然后在“网络”下查看了当我点击“最快”按钮时发出的请求。请求是“https://www.collegeswimming.com/swimmer/356597/times/fastest/“. Notice that it is not possible to navigate to this by itself, as it just leads to the original https://www.collegeswimming.com/swimmer/356597。我尝试这样使用请求：

import requests
r=requests.get(“https://www.collegeswimming.com/swimmer/356597/times/fastest”)
print(r.text)
print(r.content)
r.json()

可悲的是，none 这些工作。我正在寻找的响应是单击“最快”后显示的响应，可以通过“检查元素 -> 网络 -> 最快/ -> 响应”查看，但我使用上述代码得到的响应只是 html 原始页面“https://www.collegeswimming.com/swimmer/356597”

非常感谢您的帮助！

Answer 1

如果您在“网络”选项卡的请求中查看请求 headers，您应该会看到 X-Requested-With header 的值为 XMLHttpRequest，表明这是一个 AJAX 电话。您可以像这样在您的请求中添加此请求 header：

url = "https://www.collegeswimming.com/swimmer/356597/times/fastest/"
r = requests.get(url, headers={"X-Requested-With": "XMLHttpRequest"})
print(r.text)

如何抓取更改 HTML？

How to webscrape changing HTML?

python

innerhtml

request