Web 抓取和查找元素
Webscraping and finding elements
我试图找出比赛何时被推迟并获取相关的球队信息或比赛编号,因为我将球队缩写附加到列表中。目前发生的情况是它只获取推迟的项目,并跳过没有推迟的游戏。我想我需要更改 soup.select 行,或者做一些稍微不同的事情,但无法弄清楚。
代码没有抛出任何错误,但列表 returned 是 [0,1,2,3]。但是,如果您打开 https://www.rotowire.com/baseball/daily-lineups.php,它应该 return [0,1,14,15],因为这些是推迟比赛的团队元素。
from bs4 import BeautifulSoup
import requests
url = 'https://www.rotowire.com/baseball/daily-lineups.php'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
x = 0
gamesRemoved = []
for tag in soup.select(".lineup__main > div"):
ppcheck = tag.text
if "POSTPONED" in ppcheck:
print(x)
print('Postponement')
first_team = x*2
print(first_team)
gamesRemoved.append(first_team)
second_team = x*2+1
gamesRemoved.append(second_team)
x+=1
else:
x+=1
continue
print(gamesRemoved)
您可以使用 BeautifulSoup.select
并检查 'is-postponed'
是否作为 class 名称存在于阵容框中:
from bs4 import BeautifulSoup as soup
import requests
d = soup(requests.get('https://www.rotowire.com/baseball/daily-lineups.php').text, 'html.parser')
p = [j for i, a in enumerate(d.select('.lineup.is-mlb')) for j in [i*2, i*2+1] if 'is-postponed' in a['class']]
输出:
[0, 1, 14, 15]
我试图找出比赛何时被推迟并获取相关的球队信息或比赛编号,因为我将球队缩写附加到列表中。目前发生的情况是它只获取推迟的项目,并跳过没有推迟的游戏。我想我需要更改 soup.select 行,或者做一些稍微不同的事情,但无法弄清楚。
代码没有抛出任何错误,但列表 returned 是 [0,1,2,3]。但是,如果您打开 https://www.rotowire.com/baseball/daily-lineups.php,它应该 return [0,1,14,15],因为这些是推迟比赛的团队元素。
from bs4 import BeautifulSoup
import requests
url = 'https://www.rotowire.com/baseball/daily-lineups.php'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
x = 0
gamesRemoved = []
for tag in soup.select(".lineup__main > div"):
ppcheck = tag.text
if "POSTPONED" in ppcheck:
print(x)
print('Postponement')
first_team = x*2
print(first_team)
gamesRemoved.append(first_team)
second_team = x*2+1
gamesRemoved.append(second_team)
x+=1
else:
x+=1
continue
print(gamesRemoved)
您可以使用 BeautifulSoup.select
并检查 'is-postponed'
是否作为 class 名称存在于阵容框中:
from bs4 import BeautifulSoup as soup
import requests
d = soup(requests.get('https://www.rotowire.com/baseball/daily-lineups.php').text, 'html.parser')
p = [j for i, a in enumerate(d.select('.lineup.is-mlb')) for j in [i*2, i*2+1] if 'is-postponed' in a['class']]
输出:
[0, 1, 14, 15]