无法获取 python 中的 bs4 table 内容
Not able to fetch bs4 table contents in python
我想获取此 link https://practice.geeksforgeeks.org/leaderboard/
中存在的所有用户句柄
这是试过的代码,
import requests
from bs4 import BeautifulSoup
URL = 'https://practice.geeksforgeeks.org/leaderboard/'
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata(URL)
soup = BeautifulSoup(htmldata, 'html.parser')
table= soup.find_all('table',{"id":"leaderboardTable"})
print(table[0].find_all('tbody')[1])
print(table[0].find_all('tbody')[1].tr)
输出:
<tbody id="overall_ranking">
</tbody>
None
代码正在获取 table,但是当我尝试打印 table 中存在的 tr 或 td 标签时,它显示 None。我尝试了另一种方法也使用 pandas,同样的事情发生了。
我只想要此 table (https://practice.geeksforgeeks.org/leaderboard/) 中存在的所有用户句柄
我们将不胜感激此问题的任何解决方案。
url 是动态的,beautifulsoup 无法呈现 JavaScript 但数据是从 API 生成的,这意味着网站正在使用 API。
import requests
api_url='https://practiceapi.geeksforgeeks.org/api/v1/leaderboard/ranking/?ranking_type=overall&page={page}'
for page in range(1,11):
data=requests.get(api_url.format(page=page)).json()
for handle in data:
print(handle['user_handle'])
输出:
Ibrahim Nash
blackshadows
mb1973
Quandray
akhayrutdinov
saiujwal13083
shivendr7
kirtidee18
mantu_singh
cfwong8
harshvardhancse1934
sgupta9519
sanjay05
samiranroy0407
Maverick_H
sreerammuthyam999
gfgaccount
sushant_a
verma_ji
balkar81199
marius_valentin_dragoi
ishu2001mitra
_tony_stark_01
ta7anas17113011638
yups0608
himanshujainmalpura
yujjwal9700
parthabhunia_04
KshamaGupta
the_coder95
ayush_gupta4
khushbooguptaciv18
aditya dhiman
dilipsuthar00786
adityajain9560
dharmsharma0811
Aegon_Targeryan
1032180422
mangeshagarwal1974
naveedaamir484
raj_271
Pulkit__Sharma__
aroranayan999
surbhi_7
ruchika1004ajmera
cs845418
shadymasum
lonewolf13325
user_1_4_13_19_22
SubhankarMajumdar
您可以使用 Selenium 获得它。
from selenium import webdriver
driver = webdriver.Chrome(executable_path = "<webdriver path>")
driver.get("https://practice.geeksforgeeks.org/leaderboard/")
user_names = driver.find_elements(by = "xpath", value = "//tbody[@id = 'overall_ranking']/tr/td/a")
user_names = list(map(lambda name:name.text, user_names))
driver.quit()
我想获取此 link https://practice.geeksforgeeks.org/leaderboard/
中存在的所有用户句柄这是试过的代码,
import requests
from bs4 import BeautifulSoup
URL = 'https://practice.geeksforgeeks.org/leaderboard/'
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata(URL)
soup = BeautifulSoup(htmldata, 'html.parser')
table= soup.find_all('table',{"id":"leaderboardTable"})
print(table[0].find_all('tbody')[1])
print(table[0].find_all('tbody')[1].tr)
输出:
<tbody id="overall_ranking">
</tbody>
None
代码正在获取 table,但是当我尝试打印 table 中存在的 tr 或 td 标签时,它显示 None。我尝试了另一种方法也使用 pandas,同样的事情发生了。
我只想要此 table (https://practice.geeksforgeeks.org/leaderboard/) 中存在的所有用户句柄
我们将不胜感激此问题的任何解决方案。
url 是动态的,beautifulsoup 无法呈现 JavaScript 但数据是从 API 生成的,这意味着网站正在使用 API。
import requests
api_url='https://practiceapi.geeksforgeeks.org/api/v1/leaderboard/ranking/?ranking_type=overall&page={page}'
for page in range(1,11):
data=requests.get(api_url.format(page=page)).json()
for handle in data:
print(handle['user_handle'])
输出:
Ibrahim Nash
blackshadows
mb1973
Quandray
akhayrutdinov
saiujwal13083
shivendr7
kirtidee18
mantu_singh
cfwong8
harshvardhancse1934
sgupta9519
sanjay05
samiranroy0407
Maverick_H
sreerammuthyam999
gfgaccount
sushant_a
verma_ji
balkar81199
marius_valentin_dragoi
ishu2001mitra
_tony_stark_01
ta7anas17113011638
yups0608
himanshujainmalpura
yujjwal9700
parthabhunia_04
KshamaGupta
the_coder95
ayush_gupta4
khushbooguptaciv18
aditya dhiman
dilipsuthar00786
adityajain9560
dharmsharma0811
Aegon_Targeryan
1032180422
mangeshagarwal1974
naveedaamir484
raj_271
Pulkit__Sharma__
aroranayan999
surbhi_7
ruchika1004ajmera
cs845418
shadymasum
lonewolf13325
user_1_4_13_19_22
SubhankarMajumdar
您可以使用 Selenium 获得它。
from selenium import webdriver
driver = webdriver.Chrome(executable_path = "<webdriver path>")
driver.get("https://practice.geeksforgeeks.org/leaderboard/")
user_names = driver.find_elements(by = "xpath", value = "//tbody[@id = 'overall_ranking']/tr/td/a")
user_names = list(map(lambda name:name.text, user_names))
driver.quit()