使用 bs4 python 进行网络抓取:如何显示足球比赛
Web scraping with bs4 python: How to display football matchups
我是 Python 的初学者,我正在尝试创建一个程序来从 skysports.com 中抓取 football/soccer 时间表,并通过短信将其发送到我的 [=17] =] 通过 Twilio。我已经排除了 SMS 代码,因为我已经弄清楚了,所以这是我到目前为止遇到的网络抓取代码:
import requests
from bs4 import BeautifulSoup
URL = "https://www.skysports.com/football-fixtures"
page = requests.get(URL)
results = BeautifulSoup(page.content, "html.parser")
d = defaultdict(list)
comp = results.find('h5', {"class": "fixres__header3"})
team1 = results.find('span', {"class": "matches__item-col matches__participant matches__participant--side1"})
date = results.find('span', {"class": "matches__date"})
team2 = results.find('span', {"class": "matches__item-col matches__participant matches__participant--side2"})
for ind in range(len(d)):
d['comp'].append(comp[ind].text)
d['team1'].append(team1[ind].text)
d['date'].append(date[ind].text)
d['team2'].append(team2[ind].text)
下面应该可以为您解决问题:
from bs4 import BeautifulSoup
import requests
a = requests.get('https://www.skysports.com/football-fixtures')
soup = BeautifulSoup(a.text,features="html.parser")
teams = []
for date in soup.find_all(class_="fixres__header2"): # searching in that date
for i in soup.find_all(class_="swap-text--bp30")[1:]: #skips the first one because that's a heading
teams.append(i.text)
date = soup.find(class_="fixres__header2").text
print(date)
teams = [i.strip('\n') for i in teams]
for x in range(0,len(teams),2):
print (teams[x]+" vs "+ teams[x+1])
让我进一步解释一下我做了什么:
所有足球都有这个 class 名称 - swap-text--bp30
因此我们可以使用 find_all 提取所有具有该名称的 classes。
一旦我们得到结果,我们就可以将它们放入数组“teams = []”,然后将它们附加到 for 循环“team.append(i.text)”。 ".text" 剥离 html
然后我们可以去掉数组中的“\n”,方法是将它剥离并逐个打印出数组中的每个字符串。
这应该是您的最终输出:
编辑:为了获得联赛冠军,我们将做几乎相同的事情:
league = []
for date in soup.find_all(class_="fixres__header2"): # searching in that date
for i in soup.find_all(class_="fixres__header3"): #skips the first one because that's a heading
league.append(i.text)
剥离数组并创建另一个数组:
league = [i.strip('\n') for i in league]
final = []
然后添加最后一段代码,它基本上只是打印联赛,然后一遍又一遍地打印两支球队:
for x in range(0,len(teams),5):
final.append(teams[x]+" vs "+ teams[x+1])
for i in league:
print(i)
for i in final:
print(i)
我是 Python 的初学者,我正在尝试创建一个程序来从 skysports.com 中抓取 football/soccer 时间表,并通过短信将其发送到我的 [=17] =] 通过 Twilio。我已经排除了 SMS 代码,因为我已经弄清楚了,所以这是我到目前为止遇到的网络抓取代码:
import requests
from bs4 import BeautifulSoup
URL = "https://www.skysports.com/football-fixtures"
page = requests.get(URL)
results = BeautifulSoup(page.content, "html.parser")
d = defaultdict(list)
comp = results.find('h5', {"class": "fixres__header3"})
team1 = results.find('span', {"class": "matches__item-col matches__participant matches__participant--side1"})
date = results.find('span', {"class": "matches__date"})
team2 = results.find('span', {"class": "matches__item-col matches__participant matches__participant--side2"})
for ind in range(len(d)):
d['comp'].append(comp[ind].text)
d['team1'].append(team1[ind].text)
d['date'].append(date[ind].text)
d['team2'].append(team2[ind].text)
下面应该可以为您解决问题:
from bs4 import BeautifulSoup
import requests
a = requests.get('https://www.skysports.com/football-fixtures')
soup = BeautifulSoup(a.text,features="html.parser")
teams = []
for date in soup.find_all(class_="fixres__header2"): # searching in that date
for i in soup.find_all(class_="swap-text--bp30")[1:]: #skips the first one because that's a heading
teams.append(i.text)
date = soup.find(class_="fixres__header2").text
print(date)
teams = [i.strip('\n') for i in teams]
for x in range(0,len(teams),2):
print (teams[x]+" vs "+ teams[x+1])
让我进一步解释一下我做了什么:
所有足球都有这个 class 名称 - swap-text--bp30
因此我们可以使用 find_all 提取所有具有该名称的 classes。
一旦我们得到结果,我们就可以将它们放入数组“teams = []”,然后将它们附加到 for 循环“team.append(i.text)”。 ".text" 剥离 html
然后我们可以去掉数组中的“\n”,方法是将它剥离并逐个打印出数组中的每个字符串。 这应该是您的最终输出:
编辑:为了获得联赛冠军,我们将做几乎相同的事情:
league = []
for date in soup.find_all(class_="fixres__header2"): # searching in that date
for i in soup.find_all(class_="fixres__header3"): #skips the first one because that's a heading
league.append(i.text)
剥离数组并创建另一个数组:
league = [i.strip('\n') for i in league]
final = []
然后添加最后一段代码,它基本上只是打印联赛,然后一遍又一遍地打印两支球队:
for x in range(0,len(teams),5):
final.append(teams[x]+" vs "+ teams[x+1])
for i in league:
print(i)
for i in final:
print(i)