如果多个块都包含相同的 class-名称，我如何只解析第一个 HTML 块？

Question

我需要从一个站点解析信息，在这个站点上，有 2 个块，“今天”和“昨天”，它们具有与 standard-box standard-list 相同的 class 名称。我如何 only 连续解析 first 块（在“今天”下），而不从“昨天”中提取信息，如果它们都包含相同的 class-name?

这是我的代码：

import requests


url_news = "https://www.123.org/"
response = requests.get(url_news)
soup = BeautifulSoup(response.content, "html.parser")
items = soup.findAll("div", class_="standard-box standard-list")
news_info = []
for item in items:
    news_info.append({
        "title": item.find("div", class_="newstext",).text,
        "link": item.find("a", class_="newsline article").get("href")
    })

Answer 1

当运行你提供的代码时，我没有得到 items 的输出。但是，你说你有，所以：

如果只想获取“今天”下的数据，可以用.find()代替.find_all()，因为.find()只会return 第一个 找到标签 -- 这是“今天”而不是其他标签。

所以，而不是：

items = soup.findAll("div", class_="standard-box standard-list")

使用：

items = soup.find("div", class_="standard-box standard-list")

此外，要找到 link，我需要使用 tag-name[attribute] 访问属性。这是工作代码：

news_info = []
items = soup.find("div", class_="standard-box standard-list")
for item in items:
    news_info.append(
        {"title": item.find("div", class_="newstext").text, "link": item["href"]}
    )

print(news_info)

输出：

[{'title': 'NIP crack top 3 ranking for the first time in 5 years', 'link': 'https://www.hltv.org/news/32545/nip-crack-top-3-ranking-for-the-first-time-in-5-years'}, {'title': 'Fessor joins Astralis Talent', 'link': 'https://www.hltv.org/news/32544/fessor-joins-astralis-talent'}, {'title': 'Grashog joins AGO', 'link': 'https://www.hltv.org/news/32542/grashog-joins-ago'}, {'title': 'ISSAA parts ways with Eternal Fire', 'link': 'https://www.hltv.org/news/32543/issaa-parts-ways-with-eternal-fire'}, {'title': 'BLAST Premier Fall Showdown Fantasy live', 'link': 'https://www.hltv.org/news/32541/blast-premier-fall-showdown-fantasy-live'}, {'title': 'FURIA win IEM Fall NA, EG claim final Major Legends spot', 'link': 'https://www.hltv.org/news/32540/furia-win-iem-fall-na-eg-claim-final-major-legends-spot'}]

如果多个块都包含相同的 class-名称，我如何只解析第一个 HTML 块？

How can I only parse the first HTML block from multiple blocks, if they all contain the same class-name?

python

parsing

beautifulsoup