Web 抓取 Fbref table

Question

到目前为止，我的代码适用于 FBref 网站上的不同 table，但是很难获取玩家详细信息。下面的代码：

import requests
from bs4 import BeautifulSoup, Comment


url = 'https://fbref.com/en/squads/18bb7c10/Arsenal-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

table = BeautifulSoup(soup.select_one('#stats_standard').find_next(text=lambda x: isinstance(x, Comment)), 'html.parser')

#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
tds = [td.get_text(strip=True) for td in tr.select('td')]
print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))

给我错误

AttributeError: 'NoneType' object has no attribute 'find_next'

Answer 1

会发生什么？

如前所述，不存在 ID 为 stats_standard 的 table，ID 应为 stats_standard_10728

如何修复并变得有点通用

将您的 table 选择器更改为：

table = soup.select_one('table[id^="stats_standard"]')

例子

import requests
from bs4 import BeautifulSoup, Comment


url = 'https://fbref.com/en/squads/18bb7c10/Arsenal-Stats'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

table = soup.select_one('table[id^="stats_standard"]')

#print some information from the table to screen:
for tr in table.select('tr:has(td)'):
    tds = [td.get_text(strip=True) for td in tr.select('td')]
    print('{:<30}{:<20}{:<10}'.format(tds[0], tds[3], tds[5]))

以防万一

您可以使用 pandas read_html() 抓取、显示和修改 table 数据，让您的生活变得更加轻松。

例子

import pandas as pd
pd.read_html('https://fbref.com/en/squads/18bb7c10/Arsenal-Stats')[0]

Web 抓取 Fbref table

Web Scraping Fbref table

python

beautifulsoup

domain-data-modelling

会发生什么？

如何修复并变得有点通用

以防万一