如何从该站点抓取团队名称和目标到 table?我一直在尝试几种不同的方法,但不太明白
How can scrape the team names and goals from this site into a table? Ive been trying a few different methods but can't quite figure it out
import requests
from bs4 import BeautifulSoup
URL = "https://www.hockey-reference.com/leagues/NHL_2021_games.html"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="all_games")
table = soup.find('div', attrs = {'id':'div_games'})
print(table.prettify())
Select table 不是 div 打印 table:
table = soup.find('table', attrs = {'id':'games'})
print(table.prettify())
或使用pandas.read_html()
获取table并转换为数据帧:
import pandas as pd
pd.read_html('https://www.hockey-reference.com/leagues/NHL_2021_games.html', attrs={'id':'games'})[0].iloc[:,:5]
输出:
Date
Visitor
G
Home
G.1
2021-01-13
St. Louis Blues
4
Colorado Avalanche
1
2021-01-13
Vancouver Canucks
5
Edmonton Oilers
3
2021-01-13
Pittsburgh Penguins
3
Philadelphia Flyers
6
2021-01-13
Chicago Blackhawks
1
Tampa Bay Lightning
5
2021-01-13
Montreal Canadiens
4
Toronto Maple Leafs
5
...
...
...
...
...
table = soup.find('div', attrs = {'id':'div_games'})
trs = table.find_all('tr')
gamestats = []
for tr in trs:
gamestat = {}
gamestat['home_team_name'] = tr.find('td', attrs = {'data-stat' : 'home_team_name'})
gamestat['visit_team_name'] = tr.find('td', attrs = {'data-stat' : 'visit_team_name'})
gamestats.append(gamestat)
import requests
from bs4 import BeautifulSoup
URL = "https://www.hockey-reference.com/leagues/NHL_2021_games.html"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find(id="all_games")
table = soup.find('div', attrs = {'id':'div_games'})
print(table.prettify())
Select table 不是 div 打印 table:
table = soup.find('table', attrs = {'id':'games'})
print(table.prettify())
或使用pandas.read_html()
获取table并转换为数据帧:
import pandas as pd
pd.read_html('https://www.hockey-reference.com/leagues/NHL_2021_games.html', attrs={'id':'games'})[0].iloc[:,:5]
输出:
Date | Visitor | G | Home | G.1 |
---|---|---|---|---|
2021-01-13 | St. Louis Blues | 4 | Colorado Avalanche | 1 |
2021-01-13 | Vancouver Canucks | 5 | Edmonton Oilers | 3 |
2021-01-13 | Pittsburgh Penguins | 3 | Philadelphia Flyers | 6 |
2021-01-13 | Chicago Blackhawks | 1 | Tampa Bay Lightning | 5 |
2021-01-13 | Montreal Canadiens | 4 | Toronto Maple Leafs | 5 |
... | ... | ... | ... | ... |
table = soup.find('div', attrs = {'id':'div_games'})
trs = table.find_all('tr')
gamestats = []
for tr in trs:
gamestat = {}
gamestat['home_team_name'] = tr.find('td', attrs = {'data-stat' : 'home_team_name'})
gamestat['visit_team_name'] = tr.find('td', attrs = {'data-stat' : 'visit_team_name'})
gamestats.append(gamestat)