如何抓取点击事件后可用的数据

How to scrape data that is available after events of clicks

我可以拉下一个 HTML 页面,但不确定如何访问隐藏在单击按钮下的文本数据,因为数据不在页面源代码中。

from requests import get

URL = 'https://melvyl.on.worldcat.org/oclc/1076548274'
step1 = get(URL)

print(steps.text)
# how do I navigate to `Check Availability`?

我想获取当您单击 UC Berkeley 图书馆旁边的 Check Availability 时交互式显示的数据。这将打开一个包含我要查找的电话号码的框(例如“DT157.675 .M37 2019”)。

当您监控网络流量时[在您的浏览器中转到更多工具 > 开发人员工具 > 网络或在 chrome 浏览器中按 Ctrl + Shift + I,然后按 select 网络,然后过滤 XHR],你会看到当你点击Check Availability时,浏览器向另一个URL发出get请求,获取数据

from requests import get
from bs4 import BeautifulSoup

# Monitor Post Requests
id_ = 5689
URL = f'https://melvyl.on.worldcat.org/ajax/availabilityFulfillment/oclc/1076548274/registryId/{id_}'
params = {'editionclusteroclcnumbers': 1076548274}

response = get(URL, params=params)

soup = BeautifulSoup(response.text, 'html.parser')
class_name = "availability_call_number_cell availability_left_hand_cell"
results = soup.find('td', class_=class_name).get_text(strip=True)

print(results)
#'DT157.675 .M37 2019'

例子

尝试不同的地方,似乎唯一改变的是id_。如果你知道 id,那么我们可以通过循环收集所有数据:


# Monitor Post Requests

# Lets get all ids

URL = 'https://melvyl.on.worldcat.org/ajax/availabilityFulfillment/oclc/1076548274'
params = {'editionClusterOclcNumbers': '1076548274%2C1130899029%2C1126209791'}
response = get(URL, params=params)
soup = BeautifulSoup(response.text, 'html.parser')
id_s = [item['id'].split('_')[-2] for item in soup.find_all("button", {"title":"Check Availability"})]

# get data for all ids
data = []
class_name = "availability_call_number_cell availability_left_hand_cell"
for id_ in id_s:
 
    URL= f'https://melvyl.on.worldcat.org/ajax/availabilityFulfillment/oclc/1076548274/registryId/{id_}'
    params = {'editionclusteroclcnumbers': 1076548274}

    response = get(URL, params=params)

    soup = BeautifulSoup(response.text, 'html.parser')
    
    data.append(soup.find('td', class_=class_name).get_text(strip=True))
    
print(data)