分页,每个页面没有不同的 url
Pagination without having different urls to each page
我正在抓取网页(使用 Python 请求和请求-html 模块),我需要浏览项目列表的所有页面。
在 "human user" 世界中,我点击“2”进入第二页,或者点击“->”从实际页面转到下一页。
当我检查我刚才说的元素时,它们是一个 <div>
标签,例如:
<div class="pagination__Page..."> 2 </div>
或
<div class="pagination__Page..."> -> </div>
两者都有一个 event
链接,所以当我点击它时,会转到下一页。
我已经尝试执行 requests-HTML 文档建议的 for 循环分页,但在这种情况下不起作用,因为没有与 r.html
对象关联的链接,也不是列表的每一页。
当我在网站上单击时,"divs" url 根本没有改变。
检查event
(对于2
情况)它调用了一个JS函数,例如:
function() {
return a({
pageNum: e
})
}
检查event
函数(对于->
的情况)它调用了一个JS,例如:
function() {
return a({
direction: "right"
})
}
我想得到与点击时相同的结果,但我不知道如何。
您必须使用开发工具来获取精确的查询参数(特别是 rqid
),但这应该可以让您继续。它将 return 完整列表,无需逐页查看:
import requests
from pandas.io.json import json_normalize
url = 'https://www.flightstats.com/v2/api-next/flight-tracker/arr/ORY/2019/4/29/6'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
query = {
'carrierCode': '',
'numHours': '6',
'rqid': '7tl8o43bkps'}
jsonData = requests.get(url, headers=headers, params=query).json()
df = json_normalize(jsonData['data']['flights'])
输出:
print (df)
airport.city ... url
0 Cayenne ... /flight-tracker/TX/571?year=2019&month=4&date=...
1 Saint Denis de la Reunion ... /flight-tracker/AF/671?year=2019&month=4&date=...
2 Pointe-a-Pitre ... /flight-tracker/SS/3541?year=2019&month=4&date...
3 Pointe-a-Pitre ... /flight-tracker/TX/541?year=2019&month=4&date=...
4 Moscow ... /flight-tracker/S7/4021?year=2019&month=4&date...
5 Moscow ... /flight-tracker/ZI/516?year=2019&month=4&date=...
6 Cayenne ... /flight-tracker/AF/853?year=2019&month=4&date=...
7 Cayenne ... /flight-tracker/KL/2245?year=2019&month=4&date...
8 Toulouse ... /flight-tracker/AF/6101?year=2019&month=4&date...
9 Pointe-a-Pitre ... /flight-tracker/KL/2261?year=2019&month=4&date...
10 Toulouse ... /flight-tracker/HOP/5101?year=2019&month=4&dat...
11 Pointe-a-Pitre ... /flight-tracker/AF/793?year=2019&month=4&date=...
12 Beirut ... /flight-tracker/SS/6628?year=2019&month=4&date...
13 Beirut ... /flight-tracker/ZI/628?year=2019&month=4&date=...
14 Montpellier ... /flight-tracker/AF/7541?year=2019&month=4&date...
15 Geneva ... /flight-tracker/U2/1399?year=2019&month=4&date...
16 Montpellier ... /flight-tracker/HOP/5541?year=2019&month=4&dat...
17 Ajaccio ... /flight-tracker/AF/4442?year=2019&month=4&date...
18 Bastia ... /flight-tracker/HOP/7780?year=2019&month=4&dat...
19 Ajaccio ... /flight-tracker/HOP/7770?year=2019&month=4&dat...
20 Ajaccio ... /flight-tracker/XK/770?year=2019&month=4&date=...
21 Bastia ... /flight-tracker/XK/780?year=2019&month=4&date=...
22 Bastia ... /flight-tracker/AF/4458?year=2019&month=4&date...
23 Marseille ... /flight-tracker/HOP/5001?year=2019&month=4&dat...
24 Marseille ... /flight-tracker/AF/6001?year=2019&month=4&date...
25 Clermont-Ferrand ... /flight-tracker/AF/7433?year=2019&month=4&date...
26 Clermont-Ferrand ... /flight-tracker/HOP/5433?year=2019&month=4&dat...
27 Bordeaux ... /flight-tracker/HOP/5253?year=2019&month=4&dat...
28 Bordeaux ... /flight-tracker/AF/6253?year=2019&month=4&date...
29 Nice ... /flight-tracker/HOP/5203?year=2019&month=4&dat...
.. ... ... ...
192 Marseille ... /flight-tracker/HOP/5009?year=2019&month=4&dat...
193 Sevilla ... /flight-tracker/TO/3201?year=2019&month=4&date...
194 Bordeaux ... /flight-tracker/AF/6277?year=2019&month=4&date...
195 Toulouse ... /flight-tracker/U2/4026?year=2019&month=4&date...
196 Toulouse ... /flight-tracker/HOP/5117?year=2019&month=4&dat...
197 Toulouse ... /flight-tracker/AF/6117?year=2019&month=4&date...
198 Rome ... /flight-tracker/IB/5193?year=2019&month=4&date...
199 Rome ... /flight-tracker/VY/6251?year=2019&month=4&date...
200 Bordeaux ... /flight-tracker/HOP/5277?year=2019&month=4&dat...
201 Faro ... /flight-tracker/U2/4278?year=2019&month=4&date...
202 Campinas ... /flight-tracker/AD/8900?year=2019&month=4&date...
203 Casablanca ... /flight-tracker/AT/760?year=2019&month=4&date=...
204 Campinas ... /flight-tracker/ZI/36?year=2019&month=4&date=2...
205 Rome ... /flight-tracker/U2/4242?year=2019&month=4&date...
206 Ajaccio ... /flight-tracker/XK/772?year=2019&month=4&date=...
207 Ajaccio ... /flight-tracker/AF/4445?year=2019&month=4&date...
208 Ajaccio ... /flight-tracker/HOP/7772?year=2019&month=4&dat...
209 Madrid ... /flight-tracker/AV/6049?year=2019&month=4&date...
210 Madrid ... /flight-tracker/AA/8758?year=2019&month=4&date...
211 Madrid ... /flight-tracker/IB/3436?year=2019&month=4&date...
212 Setif ... /flight-tracker/AH/1108?year=2019&month=4&date...
213 Berlin ... /flight-tracker/ZI/608?year=2019&month=4&date=...
214 Berlin ... /flight-tracker/SS/6608?year=2019&month=4&date...
215 Toulon ... /flight-tracker/AF/7513?year=2019&month=4&date...
216 Toulon ... /flight-tracker/HOP/5513?year=2019&month=4&dat...
217 Perpignan ... /flight-tracker/AF/7465?year=2019&month=4&date...
218 Perpignan ... /flight-tracker/HOP/5465?year=2019&month=4&dat...
219 Rodez ... /flight-tracker/BE/7682?year=2019&month=4&date...
220 Nantes ... /flight-tracker/AF/7383?year=2019&month=4&date...
221 Nantes ... /flight-tracker/HOP/5383?year=2019&month=4&dat...
[222 rows x 13 columns]
我正在抓取网页(使用 Python 请求和请求-html 模块),我需要浏览项目列表的所有页面。
在 "human user" 世界中,我点击“2”进入第二页,或者点击“->”从实际页面转到下一页。
当我检查我刚才说的元素时,它们是一个 <div>
标签,例如:
<div class="pagination__Page..."> 2 </div>
或
<div class="pagination__Page..."> -> </div>
两者都有一个 event
链接,所以当我点击它时,会转到下一页。
我已经尝试执行 requests-HTML 文档建议的 for 循环分页,但在这种情况下不起作用,因为没有与 r.html
对象关联的链接,也不是列表的每一页。
当我在网站上单击时,"divs" url 根本没有改变。
检查event
(对于2
情况)它调用了一个JS函数,例如:
function() {
return a({
pageNum: e
})
}
检查event
函数(对于->
的情况)它调用了一个JS,例如:
function() {
return a({
direction: "right"
})
}
我想得到与点击时相同的结果,但我不知道如何。
您必须使用开发工具来获取精确的查询参数(特别是 rqid
),但这应该可以让您继续。它将 return 完整列表,无需逐页查看:
import requests
from pandas.io.json import json_normalize
url = 'https://www.flightstats.com/v2/api-next/flight-tracker/arr/ORY/2019/4/29/6'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'}
query = {
'carrierCode': '',
'numHours': '6',
'rqid': '7tl8o43bkps'}
jsonData = requests.get(url, headers=headers, params=query).json()
df = json_normalize(jsonData['data']['flights'])
输出:
print (df)
airport.city ... url
0 Cayenne ... /flight-tracker/TX/571?year=2019&month=4&date=...
1 Saint Denis de la Reunion ... /flight-tracker/AF/671?year=2019&month=4&date=...
2 Pointe-a-Pitre ... /flight-tracker/SS/3541?year=2019&month=4&date...
3 Pointe-a-Pitre ... /flight-tracker/TX/541?year=2019&month=4&date=...
4 Moscow ... /flight-tracker/S7/4021?year=2019&month=4&date...
5 Moscow ... /flight-tracker/ZI/516?year=2019&month=4&date=...
6 Cayenne ... /flight-tracker/AF/853?year=2019&month=4&date=...
7 Cayenne ... /flight-tracker/KL/2245?year=2019&month=4&date...
8 Toulouse ... /flight-tracker/AF/6101?year=2019&month=4&date...
9 Pointe-a-Pitre ... /flight-tracker/KL/2261?year=2019&month=4&date...
10 Toulouse ... /flight-tracker/HOP/5101?year=2019&month=4&dat...
11 Pointe-a-Pitre ... /flight-tracker/AF/793?year=2019&month=4&date=...
12 Beirut ... /flight-tracker/SS/6628?year=2019&month=4&date...
13 Beirut ... /flight-tracker/ZI/628?year=2019&month=4&date=...
14 Montpellier ... /flight-tracker/AF/7541?year=2019&month=4&date...
15 Geneva ... /flight-tracker/U2/1399?year=2019&month=4&date...
16 Montpellier ... /flight-tracker/HOP/5541?year=2019&month=4&dat...
17 Ajaccio ... /flight-tracker/AF/4442?year=2019&month=4&date...
18 Bastia ... /flight-tracker/HOP/7780?year=2019&month=4&dat...
19 Ajaccio ... /flight-tracker/HOP/7770?year=2019&month=4&dat...
20 Ajaccio ... /flight-tracker/XK/770?year=2019&month=4&date=...
21 Bastia ... /flight-tracker/XK/780?year=2019&month=4&date=...
22 Bastia ... /flight-tracker/AF/4458?year=2019&month=4&date...
23 Marseille ... /flight-tracker/HOP/5001?year=2019&month=4&dat...
24 Marseille ... /flight-tracker/AF/6001?year=2019&month=4&date...
25 Clermont-Ferrand ... /flight-tracker/AF/7433?year=2019&month=4&date...
26 Clermont-Ferrand ... /flight-tracker/HOP/5433?year=2019&month=4&dat...
27 Bordeaux ... /flight-tracker/HOP/5253?year=2019&month=4&dat...
28 Bordeaux ... /flight-tracker/AF/6253?year=2019&month=4&date...
29 Nice ... /flight-tracker/HOP/5203?year=2019&month=4&dat...
.. ... ... ...
192 Marseille ... /flight-tracker/HOP/5009?year=2019&month=4&dat...
193 Sevilla ... /flight-tracker/TO/3201?year=2019&month=4&date...
194 Bordeaux ... /flight-tracker/AF/6277?year=2019&month=4&date...
195 Toulouse ... /flight-tracker/U2/4026?year=2019&month=4&date...
196 Toulouse ... /flight-tracker/HOP/5117?year=2019&month=4&dat...
197 Toulouse ... /flight-tracker/AF/6117?year=2019&month=4&date...
198 Rome ... /flight-tracker/IB/5193?year=2019&month=4&date...
199 Rome ... /flight-tracker/VY/6251?year=2019&month=4&date...
200 Bordeaux ... /flight-tracker/HOP/5277?year=2019&month=4&dat...
201 Faro ... /flight-tracker/U2/4278?year=2019&month=4&date...
202 Campinas ... /flight-tracker/AD/8900?year=2019&month=4&date...
203 Casablanca ... /flight-tracker/AT/760?year=2019&month=4&date=...
204 Campinas ... /flight-tracker/ZI/36?year=2019&month=4&date=2...
205 Rome ... /flight-tracker/U2/4242?year=2019&month=4&date...
206 Ajaccio ... /flight-tracker/XK/772?year=2019&month=4&date=...
207 Ajaccio ... /flight-tracker/AF/4445?year=2019&month=4&date...
208 Ajaccio ... /flight-tracker/HOP/7772?year=2019&month=4&dat...
209 Madrid ... /flight-tracker/AV/6049?year=2019&month=4&date...
210 Madrid ... /flight-tracker/AA/8758?year=2019&month=4&date...
211 Madrid ... /flight-tracker/IB/3436?year=2019&month=4&date...
212 Setif ... /flight-tracker/AH/1108?year=2019&month=4&date...
213 Berlin ... /flight-tracker/ZI/608?year=2019&month=4&date=...
214 Berlin ... /flight-tracker/SS/6608?year=2019&month=4&date...
215 Toulon ... /flight-tracker/AF/7513?year=2019&month=4&date...
216 Toulon ... /flight-tracker/HOP/5513?year=2019&month=4&dat...
217 Perpignan ... /flight-tracker/AF/7465?year=2019&month=4&date...
218 Perpignan ... /flight-tracker/HOP/5465?year=2019&month=4&dat...
219 Rodez ... /flight-tracker/BE/7682?year=2019&month=4&date...
220 Nantes ... /flight-tracker/AF/7383?year=2019&month=4&date...
221 Nantes ... /flight-tracker/HOP/5383?year=2019&month=4&dat...
[222 rows x 13 columns]