使用 BeautifulSoup 从 class 中提取值

Question

我需要从 python 中的 class 获取值“Anti-Mage”。我该怎么做？

<td class="cell-xlarge"><a href="/players/432283612/matches?hero=anti-mage">Anti-Mage</a><div class="subtext minor"><a href="/matches/6107031786"><time data-time-ago="2021-07-26T23:27:54+00:00" datetime="2021-07-26T23:27:54+00:00" title="Mon, 26 Jul 2021 23:27:54 +0000">2021-07-26</time></a></div></td>

Answer 1

编辑

根据您的评论，获取全部 <a>.

import requests
from bs4 import BeautifulSoup as BS

url = 'https://www.dotabuff.com/players/432283612'
headers = {
    "Accept":"*/*",
    "User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
}
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text)

[x.text for x in soup.select('article a[href*="matches?hero"]')]

输出

['Anti-Mage', 'Shadow Fiend', 'Slark', 'Morphling', 'Tinker', 'Bristleback', 'Invoker', 'Broodmother', 'Templar Assassin', 'Monkey King']

假设您问题中发布的 HTML 是 BeautifulSoup 对象，调用 <a>:

上的 text 方法

soup.a.text

或 select 更具体 class 你提到：

soup.select_one('.cell-xlarge a').text

注意： 在某些情况下通过 class 选择元素只是第三个最佳选择，因为 classes 可以是动态的，不是唯一的，... - 更好的策略是 select by id, tag

Answer 2

首先，您需要从其 class 名称中 select 父项（在本例中为 td）。你可以这样做

td = soup.find('td', {'class': 'cell-xlarge'})

然后找到 a 个类似这样的子标签

a = td.findChildren('a', recursive=False)[0]

这将为您提供 a 标签。要获得它的价值，你可以像这样使用 .string

a_value = a.string

这给了你 Anti-Mage

的价值

Answer 3

r1 = requests.get(f"https://www.dotabuff.com/players/{a}/heroes", headers = headers)

html1 = BS(r1.content, 'lxml')


for a in html1.find('td', {'class': 'cell-xlarge'}):
    b = a.findChildren('a', recursive=False)
    a_value = b.string



    print(a_value)

使用 BeautifulSoup 从 class 中提取值

Extract value from class using BeautifulSoup

html

python

parsing

beautifulsoup

编辑

输出