如何抓取点击事件后可用的数据
How to scrape data that is available after events of clicks
我可以拉下一个 HTML 页面,但不确定如何访问隐藏在单击按钮下的文本数据,因为数据不在页面源代码中。
from requests import get
URL = 'https://melvyl.on.worldcat.org/oclc/1076548274'
step1 = get(URL)
print(steps.text)
# how do I navigate to `Check Availability`?
我想获取当您单击 UC Berkeley 图书馆旁边的 Check Availability
时交互式显示的数据。这将打开一个包含我要查找的电话号码的框(例如“DT157.675 .M37 2019”)。
当您监控网络流量时[在您的浏览器中转到更多工具 > 开发人员工具 > 网络或在 chrome 浏览器中按 Ctrl + Shift + I
,然后按 select 网络,然后过滤 XHR
],你会看到当你点击Check Availability
时,浏览器向另一个URL发出get请求,获取数据
from requests import get
from bs4 import BeautifulSoup
# Monitor Post Requests
id_ = 5689
URL = f'https://melvyl.on.worldcat.org/ajax/availabilityFulfillment/oclc/1076548274/registryId/{id_}'
params = {'editionclusteroclcnumbers': 1076548274}
response = get(URL, params=params)
soup = BeautifulSoup(response.text, 'html.parser')
class_name = "availability_call_number_cell availability_left_hand_cell"
results = soup.find('td', class_=class_name).get_text(strip=True)
print(results)
#'DT157.675 .M37 2019'
例子
尝试不同的地方,似乎唯一改变的是id_
。如果你知道 id,那么我们可以通过循环收集所有数据:
# Monitor Post Requests
# Lets get all ids
URL = 'https://melvyl.on.worldcat.org/ajax/availabilityFulfillment/oclc/1076548274'
params = {'editionClusterOclcNumbers': '1076548274%2C1130899029%2C1126209791'}
response = get(URL, params=params)
soup = BeautifulSoup(response.text, 'html.parser')
id_s = [item['id'].split('_')[-2] for item in soup.find_all("button", {"title":"Check Availability"})]
# get data for all ids
data = []
class_name = "availability_call_number_cell availability_left_hand_cell"
for id_ in id_s:
URL= f'https://melvyl.on.worldcat.org/ajax/availabilityFulfillment/oclc/1076548274/registryId/{id_}'
params = {'editionclusteroclcnumbers': 1076548274}
response = get(URL, params=params)
soup = BeautifulSoup(response.text, 'html.parser')
data.append(soup.find('td', class_=class_name).get_text(strip=True))
print(data)
我可以拉下一个 HTML 页面,但不确定如何访问隐藏在单击按钮下的文本数据,因为数据不在页面源代码中。
from requests import get
URL = 'https://melvyl.on.worldcat.org/oclc/1076548274'
step1 = get(URL)
print(steps.text)
# how do I navigate to `Check Availability`?
我想获取当您单击 UC Berkeley 图书馆旁边的 Check Availability
时交互式显示的数据。这将打开一个包含我要查找的电话号码的框(例如“DT157.675 .M37 2019”)。
当您监控网络流量时[在您的浏览器中转到更多工具 > 开发人员工具 > 网络或在 chrome 浏览器中按 Ctrl + Shift + I
,然后按 select 网络,然后过滤 XHR
],你会看到当你点击Check Availability
时,浏览器向另一个URL发出get请求,获取数据
from requests import get
from bs4 import BeautifulSoup
# Monitor Post Requests
id_ = 5689
URL = f'https://melvyl.on.worldcat.org/ajax/availabilityFulfillment/oclc/1076548274/registryId/{id_}'
params = {'editionclusteroclcnumbers': 1076548274}
response = get(URL, params=params)
soup = BeautifulSoup(response.text, 'html.parser')
class_name = "availability_call_number_cell availability_left_hand_cell"
results = soup.find('td', class_=class_name).get_text(strip=True)
print(results)
#'DT157.675 .M37 2019'
例子
尝试不同的地方,似乎唯一改变的是id_
。如果你知道 id,那么我们可以通过循环收集所有数据:
# Monitor Post Requests
# Lets get all ids
URL = 'https://melvyl.on.worldcat.org/ajax/availabilityFulfillment/oclc/1076548274'
params = {'editionClusterOclcNumbers': '1076548274%2C1130899029%2C1126209791'}
response = get(URL, params=params)
soup = BeautifulSoup(response.text, 'html.parser')
id_s = [item['id'].split('_')[-2] for item in soup.find_all("button", {"title":"Check Availability"})]
# get data for all ids
data = []
class_name = "availability_call_number_cell availability_left_hand_cell"
for id_ in id_s:
URL= f'https://melvyl.on.worldcat.org/ajax/availabilityFulfillment/oclc/1076548274/registryId/{id_}'
params = {'editionclusteroclcnumbers': 1076548274}
response = get(URL, params=params)
soup = BeautifulSoup(response.text, 'html.parser')
data.append(soup.find('td', class_=class_name).get_text(strip=True))
print(data)