我如何从行中获取 href
How i can get href from row
我做了一些电报机器人,我需要从 html 获取链接。
我想从这个网站获取匹配项的 href https://www.hltv.org/matches
我之前的密码是
elif message.text == "Matches":
url_news = "https://www.hltv.org/matches"
response = requests.get(url_news)
soup = BeautifulSoup(response.content, "html.parser")
match_info = []
match_items = soup.find("div", class_="upcomingMatchesSection")
print(match_items)
for item in match_items:
match_info.append({
"link": item.find("div", class_="upcomingMatch").text,
"title": item["href"]
})
而且我不知道如何从中获取链接 body.Appreciate 任何帮助
会发生什么?
您尝试遍历 match_items
,但没有可迭代的内容,因为您只 select 编辑了包含匹配项的部分,而不是匹配项本身。
如何修复?
Select upcomingMatches 并迭代它们:
match_items = soup.select("div.upcomingMatchesSection div.upcomingMatch")
获得 url
你必须 select 一个 <a>
:
item.a["href"]
例子
from bs4 import BeautifulSoup as bs
import requests
url_news = "https://www.hltv.org/matches"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
response = requests.get(url_news, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
match_info = []
match_items = soup.select("div.upcomingMatchesSection div.upcomingMatch")
for item in match_items:
match_info.append({
"title": item.get_text('|', strip=True),
"link": item.a["href"]
})
match_info
输出
[{'title': '09:00|bo3|1WIN|K23|Pinnacle Fall Series 2|Odds',
'link': '/matches/2352066/1win-vs-k23-pinnacle-fall-series-2'},
{'title': '09:00|bo3|INDE IRAE|Nemiga|Pinnacle Fall Series 2|Odds',
'link': '/matches/2352067/inde-irae-vs-nemiga-pinnacle-fall-series-2'},
{'title': '10:00|bo3|OPAA|Nexus|Malta Vibes Knockout Series 3|Odds',
'link': '/matches/2352207/opaa-vs-nexus-malta-vibes-knockout-series-3'},
{'title': '11:00|bo3|Checkmate|TBC|Funspark ULTI 2021 Asia Regional Series 3|Odds',
'link': '/matches/2352092/checkmate-vs-tbc-funspark-ulti-2021-asia-regional-series-3'},
{'title': '11:00|bo3|ORDER|Alke|ESEA Premier Season 38 Australia|Odds',
'link': '/matches/2352122/order-vs-alke-esea-premier-season-38-australia'},...]
你可以试试这个。
- 所有匹配信息都存在于
<div>
中,classname 为 upcomingMatch
- Select 所有这些
<div>
并从每个 <div>
中提取匹配项 link,它存在于 <a>
标签内 class 命名为 match
.
代码如下:
import requests
from bs4 import BeautifulSoup
url_news = "https://www.hltv.org/matches"
headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"}
response = requests.get(url_news,headers=headers)
soup = BeautifulSoup(response.text, "lxml")
match_items = soup.find_all("div", class_="upcomingMatch")
for match in match_items:
link = match.find('a', class_='match a-reset')['href']
print(f'Link: {link}')
Link: /matches/2352235/malta-vibes-knockout-series-3-quarter-final-1-malta-vibes-knockout-series-3
Link: /matches/2352098/pinnacle-fall-series-2-quarter-final-2-pinnacle-fall-series-2
Link: /matches/2352236/malta-vibes-knockout-series-3-quarter-final-2-malta-vibes-knockout-series-3
Link: /matches/2352099/pinnacle-fall-series-2-quarter-final-3-pinnacle-fall-series-2
.
.
.
我做了一些电报机器人,我需要从 html 获取链接。 我想从这个网站获取匹配项的 href https://www.hltv.org/matches
我之前的密码是
elif message.text == "Matches":
url_news = "https://www.hltv.org/matches"
response = requests.get(url_news)
soup = BeautifulSoup(response.content, "html.parser")
match_info = []
match_items = soup.find("div", class_="upcomingMatchesSection")
print(match_items)
for item in match_items:
match_info.append({
"link": item.find("div", class_="upcomingMatch").text,
"title": item["href"]
})
而且我不知道如何从中获取链接 body.Appreciate 任何帮助
会发生什么?
您尝试遍历 match_items
,但没有可迭代的内容,因为您只 select 编辑了包含匹配项的部分,而不是匹配项本身。
如何修复?
Select upcomingMatches 并迭代它们:
match_items = soup.select("div.upcomingMatchesSection div.upcomingMatch")
获得 url
你必须 select 一个 <a>
:
item.a["href"]
例子
from bs4 import BeautifulSoup as bs
import requests
url_news = "https://www.hltv.org/matches"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
response = requests.get(url_news, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
match_info = []
match_items = soup.select("div.upcomingMatchesSection div.upcomingMatch")
for item in match_items:
match_info.append({
"title": item.get_text('|', strip=True),
"link": item.a["href"]
})
match_info
输出
[{'title': '09:00|bo3|1WIN|K23|Pinnacle Fall Series 2|Odds',
'link': '/matches/2352066/1win-vs-k23-pinnacle-fall-series-2'},
{'title': '09:00|bo3|INDE IRAE|Nemiga|Pinnacle Fall Series 2|Odds',
'link': '/matches/2352067/inde-irae-vs-nemiga-pinnacle-fall-series-2'},
{'title': '10:00|bo3|OPAA|Nexus|Malta Vibes Knockout Series 3|Odds',
'link': '/matches/2352207/opaa-vs-nexus-malta-vibes-knockout-series-3'},
{'title': '11:00|bo3|Checkmate|TBC|Funspark ULTI 2021 Asia Regional Series 3|Odds',
'link': '/matches/2352092/checkmate-vs-tbc-funspark-ulti-2021-asia-regional-series-3'},
{'title': '11:00|bo3|ORDER|Alke|ESEA Premier Season 38 Australia|Odds',
'link': '/matches/2352122/order-vs-alke-esea-premier-season-38-australia'},...]
你可以试试这个。
- 所有匹配信息都存在于
<div>
中,classname 为upcomingMatch
- Select 所有这些
<div>
并从每个<div>
中提取匹配项 link,它存在于<a>
标签内 class 命名为match
.
代码如下:
import requests
from bs4 import BeautifulSoup
url_news = "https://www.hltv.org/matches"
headers = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"}
response = requests.get(url_news,headers=headers)
soup = BeautifulSoup(response.text, "lxml")
match_items = soup.find_all("div", class_="upcomingMatch")
for match in match_items:
link = match.find('a', class_='match a-reset')['href']
print(f'Link: {link}')
Link: /matches/2352235/malta-vibes-knockout-series-3-quarter-final-1-malta-vibes-knockout-series-3
Link: /matches/2352098/pinnacle-fall-series-2-quarter-final-2-pinnacle-fall-series-2
Link: /matches/2352236/malta-vibes-knockout-series-3-quarter-final-2-malta-vibes-knockout-series-3
Link: /matches/2352099/pinnacle-fall-series-2-quarter-final-3-pinnacle-fall-series-2
.
.
.