我可以提取数据模块组中 DIV 的 HTML 内容吗
Can I extract the HTML contents of DIV that is in a data-module-group
我正在尝试从以下页面中删除数据-* div 中的 table(s):
https://www.fiba.basketball/eurobasket/2022/qualifiers/game/2202/Turkey-Croatia#|tab=boxscore
我尝试过的所有不同方法都使用 BeautifulSoup returns 空白,因为我假设 HTML 内容与我不确定是否可以访问的数据模块有关.
被检查的页面元素为
<div data-module-group="game-boxscore" id="gamepage_boxscore">
<div data-module-name="GAME_BOX_SCORE" data-module-type="live" id="jet-game-boxscore"
</div><div class="tab_ajax_content" data-module-type="esi"><!-- GamePageBoxScoreModuleModel d45eda71-7aac-4017-b055-e70f29b1d352 19.05.2022 16:08:58.198 on vmssprodI000009
@moduleident(/en/Module/d45eda71-7aac-4017-b055-e70f29b1d352/adad72a7-4113-4201-ae49-5b994ca0158b)@ -->
table 本身就在这个 DIV 中。如果能为我指明正确的方向,甚至确认这是否可行,我们将不胜感激。
亚当
这些是由 ajax 调用生成的。因此,您需要从该来源获取数据。您可以在您提供的 link 中找到 ajax url。
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.fiba.basketball/eurobasket/2022/qualifiers/game/2202/Turkey-Croatia'
response =requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
boxscore = soup.find('li', {'data-tab-content':'boxscore'})['data-ajax-url']
ajaxUrl = f'https://www.fiba.basketball{boxscore}'
response =requests.get(ajaxUrl)
soup = BeautifulSoup(response.text, 'html.parser')
teamNames = [x.text.strip() for x in soup.find_all('header', {'class':'team_caption'})]
dfs = pd.read_html(ajaxUrl)
boxscore_df = pd.DataFrame()
for idx, df in enumerate(dfs):
df['Team'] = teamNames[idx]
boxscore_df = pd.concat([boxscore_df, df], axis=0).reset_index(drop=True)
输出:
print(boxscore_df.to_string())
# Players Min Pts FG 2Pts 3Pts FT OREB DREB REB AST PF TO STL BLK +/- EFF Team
0 0 Shane Larkin 27:10 16 5/8 62.5% 1/2 50% 4/6 66.7% 2/3 66.7% 0 1 1 7 2 1 2 2 9 23 Turkey
1 2 Sehmus Hazer 16:16 9 3/5 60% 2/4 50% 1/1 100% 2/2 100% 0 1 1 1 1 1 1 0 -2 9 Turkey
2 9 Samet Geyik 12:17 0 0/1 0% 0/0 - 0/1 0% 0/0 - 0 4 4 0 2 0 1 1 -3 5 Turkey
3 10 Melih Mahmutoglu 23:35 12 6/12 50% 6/7 85.7% 0/5 0% 0/0 - 0 0 0 2 0 0 0 0 5 8 Turkey
4 13 Berkan Durmaz 05:31 0 0/0 - 0/0 - 0/0 - 0/0 - 1 1 2 0 1 1 1 0 2 2 Turkey
5 14 Furkan Haltali 05:06 2 1/2 50% 1/2 50% 0/0 - 0/0 - 0 0 0 0 2 1 0 0 1 0 Turkey
6 17 Berk Ugurlu 16:06 3 1/4 25% 0/1 0% 1/3 33.3% 0/0 - 0 0 0 1 1 1 1 0 -4 1 Turkey
7 18 Dogus Özdemiroglu 04:28 2 1/2 50% 1/2 50% 0/0 - 0/0 - 0 0 0 0 1 0 1 0 0 2 Turkey
8 19 Bugrahan Tuncer 23:54 17 7/10 70% 4/5 80% 3/5 60% 0/0 - 0 5 5 7 3 4 1 1 10 24 Turkey
9 21 Sertac Sanli 28:05 17 8/10 80% 7/7 100% 1/3 33.3% 0/0 - 3 3 6 2 2 0 0 1 0 24 Turkey
10 23 Alperen Sengun 25:07 6 1/2 50% 1/2 50% 0/0 - 4/6 66.7% 3 2 5 1 2 2 0 2 7 9 Turkey
11 61 Goksenin Koksal 12:25 0 0/1 0% 0/0 - 0/1 0% 0/0 - 0 1 1 0 1 0 0 0 5 0 Turkey
12 Team/Coaches Team/Coaches NaN NaN NaN NaN NaN NaN 0 1 1 NaN 0 0 NaN NaN NaN NaN Turkey
13 Totals Totals 200 84 33/57 57.9% 23/32 71.9% 10/25 40% 8/11 72.7% 7 19 26 21 18 11 8 7 6 107 Turkey
14 2 Goran Filipovic 14:47 12 4/6 66.7% 3/5 60% 1/1 100% 3/3 100% 0 3 3 2 1 2 0 0 1 13 Croatia
15 7 Jakov Mustapic 20:01 5 2/3 66.7% 1/1 100% 1/2 50% 0/0 - 0 3 3 1 3 1 0 0 -11 7 Croatia
16 8 Roko Prkacin 15:39 4 2/5 40% 2/3 66.7% 0/2 0% 0/0 - 0 0 0 5 3 1 0 0 -3 5 Croatia
17 11 Tomislav Gabric 20:24 13 4/5 80% 1/1 100% 3/4 75% 2/2 100% 2 2 4 0 1 0 0 0 -6 16 Croatia
18 12 Pavle Marcinkovic 18:22 2 1/1 100% 1/1 100% 0/0 - 0/0 - 0 1 1 2 1 1 3 0 3 7 Croatia
19 13 Antonio Vrankovic 15:47 10 5/5 100% 5/5 100% 0/0 - 0/0 - 0 0 0 2 2 1 0 0 1 11 Croatia
20 15 Miro Bilan 24:13 12 6/10 60% 6/9 66.7% 0/1 0% 0/0 - 1 2 3 2 3 2 0 0 -7 11 Croatia
21 23 Mateo Dreznjak 21:13 6 3/5 60% 3/4 75% 0/1 0% 0/0 - 2 0 2 0 2 0 0 0 2 6 Croatia
22 30 Dominik Mavra 25:13 4 1/9 11.1% 1/4 25% 0/5 0% 2/2 100% 0 1 1 3 2 0 0 0 -7 0 Croatia
23 33 Zeljko Sakic 24:21 10 3/8 37.5% 2/4 50% 1/4 25% 3/4 75% 2 3 5 3 2 3 0 0 -3 9 Croatia
24 34 Marin Maric Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Croatia
25 41 Antonio Jordano Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Croatia
26 Team/Coaches Team/Coaches NaN NaN NaN NaN NaN NaN 1 4 5 NaN 0 1 NaN NaN NaN NaN Croatia
27 Totals Totals 200 78 31/57 54.4% 25/37 67.6% 6/20 30% 10/11 90.9% 8 19 27 20 20 12 3 0 -6 85 Croatia
我正在尝试从以下页面中删除数据-* div 中的 table(s):
https://www.fiba.basketball/eurobasket/2022/qualifiers/game/2202/Turkey-Croatia#|tab=boxscore
我尝试过的所有不同方法都使用 BeautifulSoup returns 空白,因为我假设 HTML 内容与我不确定是否可以访问的数据模块有关.
被检查的页面元素为
<div data-module-group="game-boxscore" id="gamepage_boxscore">
<div data-module-name="GAME_BOX_SCORE" data-module-type="live" id="jet-game-boxscore"
</div><div class="tab_ajax_content" data-module-type="esi"><!-- GamePageBoxScoreModuleModel d45eda71-7aac-4017-b055-e70f29b1d352 19.05.2022 16:08:58.198 on vmssprodI000009
@moduleident(/en/Module/d45eda71-7aac-4017-b055-e70f29b1d352/adad72a7-4113-4201-ae49-5b994ca0158b)@ -->
table 本身就在这个 DIV 中。如果能为我指明正确的方向,甚至确认这是否可行,我们将不胜感激。
亚当
这些是由 ajax 调用生成的。因此,您需要从该来源获取数据。您可以在您提供的 link 中找到 ajax url。
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.fiba.basketball/eurobasket/2022/qualifiers/game/2202/Turkey-Croatia'
response =requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
boxscore = soup.find('li', {'data-tab-content':'boxscore'})['data-ajax-url']
ajaxUrl = f'https://www.fiba.basketball{boxscore}'
response =requests.get(ajaxUrl)
soup = BeautifulSoup(response.text, 'html.parser')
teamNames = [x.text.strip() for x in soup.find_all('header', {'class':'team_caption'})]
dfs = pd.read_html(ajaxUrl)
boxscore_df = pd.DataFrame()
for idx, df in enumerate(dfs):
df['Team'] = teamNames[idx]
boxscore_df = pd.concat([boxscore_df, df], axis=0).reset_index(drop=True)
输出:
print(boxscore_df.to_string())
# Players Min Pts FG 2Pts 3Pts FT OREB DREB REB AST PF TO STL BLK +/- EFF Team
0 0 Shane Larkin 27:10 16 5/8 62.5% 1/2 50% 4/6 66.7% 2/3 66.7% 0 1 1 7 2 1 2 2 9 23 Turkey
1 2 Sehmus Hazer 16:16 9 3/5 60% 2/4 50% 1/1 100% 2/2 100% 0 1 1 1 1 1 1 0 -2 9 Turkey
2 9 Samet Geyik 12:17 0 0/1 0% 0/0 - 0/1 0% 0/0 - 0 4 4 0 2 0 1 1 -3 5 Turkey
3 10 Melih Mahmutoglu 23:35 12 6/12 50% 6/7 85.7% 0/5 0% 0/0 - 0 0 0 2 0 0 0 0 5 8 Turkey
4 13 Berkan Durmaz 05:31 0 0/0 - 0/0 - 0/0 - 0/0 - 1 1 2 0 1 1 1 0 2 2 Turkey
5 14 Furkan Haltali 05:06 2 1/2 50% 1/2 50% 0/0 - 0/0 - 0 0 0 0 2 1 0 0 1 0 Turkey
6 17 Berk Ugurlu 16:06 3 1/4 25% 0/1 0% 1/3 33.3% 0/0 - 0 0 0 1 1 1 1 0 -4 1 Turkey
7 18 Dogus Özdemiroglu 04:28 2 1/2 50% 1/2 50% 0/0 - 0/0 - 0 0 0 0 1 0 1 0 0 2 Turkey
8 19 Bugrahan Tuncer 23:54 17 7/10 70% 4/5 80% 3/5 60% 0/0 - 0 5 5 7 3 4 1 1 10 24 Turkey
9 21 Sertac Sanli 28:05 17 8/10 80% 7/7 100% 1/3 33.3% 0/0 - 3 3 6 2 2 0 0 1 0 24 Turkey
10 23 Alperen Sengun 25:07 6 1/2 50% 1/2 50% 0/0 - 4/6 66.7% 3 2 5 1 2 2 0 2 7 9 Turkey
11 61 Goksenin Koksal 12:25 0 0/1 0% 0/0 - 0/1 0% 0/0 - 0 1 1 0 1 0 0 0 5 0 Turkey
12 Team/Coaches Team/Coaches NaN NaN NaN NaN NaN NaN 0 1 1 NaN 0 0 NaN NaN NaN NaN Turkey
13 Totals Totals 200 84 33/57 57.9% 23/32 71.9% 10/25 40% 8/11 72.7% 7 19 26 21 18 11 8 7 6 107 Turkey
14 2 Goran Filipovic 14:47 12 4/6 66.7% 3/5 60% 1/1 100% 3/3 100% 0 3 3 2 1 2 0 0 1 13 Croatia
15 7 Jakov Mustapic 20:01 5 2/3 66.7% 1/1 100% 1/2 50% 0/0 - 0 3 3 1 3 1 0 0 -11 7 Croatia
16 8 Roko Prkacin 15:39 4 2/5 40% 2/3 66.7% 0/2 0% 0/0 - 0 0 0 5 3 1 0 0 -3 5 Croatia
17 11 Tomislav Gabric 20:24 13 4/5 80% 1/1 100% 3/4 75% 2/2 100% 2 2 4 0 1 0 0 0 -6 16 Croatia
18 12 Pavle Marcinkovic 18:22 2 1/1 100% 1/1 100% 0/0 - 0/0 - 0 1 1 2 1 1 3 0 3 7 Croatia
19 13 Antonio Vrankovic 15:47 10 5/5 100% 5/5 100% 0/0 - 0/0 - 0 0 0 2 2 1 0 0 1 11 Croatia
20 15 Miro Bilan 24:13 12 6/10 60% 6/9 66.7% 0/1 0% 0/0 - 1 2 3 2 3 2 0 0 -7 11 Croatia
21 23 Mateo Dreznjak 21:13 6 3/5 60% 3/4 75% 0/1 0% 0/0 - 2 0 2 0 2 0 0 0 2 6 Croatia
22 30 Dominik Mavra 25:13 4 1/9 11.1% 1/4 25% 0/5 0% 2/2 100% 0 1 1 3 2 0 0 0 -7 0 Croatia
23 33 Zeljko Sakic 24:21 10 3/8 37.5% 2/4 50% 1/4 25% 3/4 75% 2 3 5 3 2 3 0 0 -3 9 Croatia
24 34 Marin Maric Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Croatia
25 41 Antonio Jordano Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Did not play - Coach decision Croatia
26 Team/Coaches Team/Coaches NaN NaN NaN NaN NaN NaN 1 4 5 NaN 0 1 NaN NaN NaN NaN Croatia
27 Totals Totals 200 78 31/57 54.4% 25/37 67.6% 6/20 30% 10/11 90.9% 8 19 27 20 20 12 3 0 -6 85 Croatia