Web 抓取 Fbref table
Web Scraping Fbref table
到目前为止,我的代码适用于 FBref 网站上的不同 table,但是很难获取玩家详细信息。下面的代码:
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://fbref.com/en/squads/18bb7c10/Arsenal-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
table = BeautifulSoup(soup.select_one('#stats_standard').find_next(text=lambda x: isinstance(x, Comment)), 'html.parser')
#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
tds = [td.get_text(strip=True) for td in tr.select('td')]
print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))
给我错误
AttributeError: 'NoneType' object has no attribute 'find_next'
会发生什么?
如前所述,不存在 ID 为 stats_standard
的 table,ID 应为 stats_standard_10728
如何修复并变得有点通用
将您的 table 选择器更改为:
table = soup.select_one('table[id^="stats_standard"]')
例子
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://fbref.com/en/squads/18bb7c10/Arsenal-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
table = soup.select_one('table[id^="stats_standard"]')
#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
tds = [td.get_text(strip=True) for td in tr.select('td')]
print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))
以防万一
您可以使用 pandas read_html() 抓取、显示和修改 table 数据,让您的生活变得更加轻松。
例子
import pandas as pd
pd.read_html('https://fbref.com/en/squads/18bb7c10/Arsenal-Stats')[0]
到目前为止,我的代码适用于 FBref 网站上的不同 table,但是很难获取玩家详细信息。下面的代码:
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://fbref.com/en/squads/18bb7c10/Arsenal-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
table = BeautifulSoup(soup.select_one('#stats_standard').find_next(text=lambda x: isinstance(x, Comment)), 'html.parser')
#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
tds = [td.get_text(strip=True) for td in tr.select('td')]
print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))
给我错误
AttributeError: 'NoneType' object has no attribute 'find_next'
会发生什么?
如前所述,不存在 ID 为 stats_standard
的 table,ID 应为 stats_standard_10728
如何修复并变得有点通用
将您的 table 选择器更改为:
table = soup.select_one('table[id^="stats_standard"]')
例子
import requests
from bs4 import BeautifulSoup, Comment
url = 'https://fbref.com/en/squads/18bb7c10/Arsenal-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
table = soup.select_one('table[id^="stats_standard"]')
#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
tds = [td.get_text(strip=True) for td in tr.select('td')]
print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))
以防万一
您可以使用 pandas read_html() 抓取、显示和修改 table 数据,让您的生活变得更加轻松。
例子
import pandas as pd
pd.read_html('https://fbref.com/en/squads/18bb7c10/Arsenal-Stats')[0]