使用 Beautiful Soup 进行抓取只会在特定部分导致错误（遇到 NullType 对象）

Question

我正在尝试从以下网站获取特定球队（在本例中为利物浦）的伤病名单

http://www.physioroom.com/news/english_premier_league/epl_injury_table.php

它在某些球队（斯旺西）中运行良好，但在某些球队（利物浦、Everyon）中出现以下错误

TypeError: Can't convert 'NoneType' object to str implicitly

这是我正在使用的代码。

from bs4 import BeautifulSoup
import urllib.request


url = "http://www.physioroom.com/news/english_premier_league/epl_injury_table.php"
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, "html.parser")
#lp = soup.find(alt="Liverpool away shirt").parent.parent.parent.next_sibling.next_sibling
lp = soup.find(alt="Swansea City away shirt").parent.parent.parent.next_sibling.next_sibling
player_info = ""
player_list = []

while True:
    if(lp.has_attr('id')):
            break
    else:
            tdlist = lp.find_all('td')#     player_info = tdlist[0].string+"\t"+tdlist[1].string+"\t"+tdlist[3].string
            #print(tdlist[0].find('a').string.strip() + "\t" + tdlist[1].string.strip() + "\t" + tdlist[3].string.strip())
            print(tdlist[0].string + "\t" + tdlist[1].string + "\t" + tdlist[3].string)
            lp=lp.findNext('tr')

请告诉我如何解决这个问题。

Answer 1

from bs4 import BeautifulSoup
import requests


url = "http://www.physioroom.com/news/english_premier_league/epl_injury_table.php"
r = requests.get(url)
soup = BeautifulSoup(r.text, "lxml")
table = soup.find('table', id='epl-table')
for tr in table('tr', id=None):
    print(tr.get_text('\t', strip=True))

输出：

PLAYER  CONDITION   LATEST NEWS EXPECTED RETURN AVAILABLE?
D Meyler    Knock   No Return Date  Slight Doubt
S Maloney   Ear Infection   No Return Date  Slight Doubt
M Henriksen Shoulder Separation April 1, 2017   Major Doubt
A McGregor  Fitness No Return Date  Major Doubt
W Keane ACL Knee Injury No Return Date
M Odubajo   Patella Fracture    May 1, 2017
G Luer  Knee Injury February 1, 2017

get_text()

如果您只需要文档或标签的文本部分，可以使用 get_text() 方法。它 returns 文档中或标签下的所有文本，作为单个 Unicode 字符串：

您可以指定一个字符串，用于将文本位连接在一起[=13=]

你可以告诉 Beautiful Soup 从每一位文本的开头和结尾去除空格

使用 Beautiful Soup 进行抓取只会在特定部分导致错误（遇到 NullType 对象）

Scraping using Beautiful Soup leads to error only in a particular section (NullType object encountered)

html

python

parsing

screen-scraping

beautifulsoup