Python BeautifulSoup: 排除 in select 语句中的其他标签

Question

我在使用 BeautifulSoup 选择文本时遇到问题。我试图仅从 <span class= "data"> 获取文本，但我也不断从其他元素获取结果。比如下面代码中我要的字是'Playstation 3'和'Game Boy Advance'，不是'PC' Could you help?

汤：

<span class="data">
                  PlayStation 3
                 </span>,
 <span class="data">
                  Game Boy Advance
                 </span>,
 <span class="data">
                  Dec 8, 2022
                 </span>,
 <span class="data">
 <a href="/game/pc">
                   PC
                  </a>

P.S。我在下面试过这个代码：

console = soup.select('span.data')
for console in console:
    print(console.get_text(strip = True))

输出片段：

PlayStation 3
Game Boy Advance
Dec 8, 2022
PC

谢谢！

Answer 1

此示例将 select 所有 <span class="data"> 其中没有任何其他标签：

from bs4 import BeautifulSoup

html_doc = """\
<span class="data">
                  PlayStation 3
                 </span>,
 <span class="data">
                  Game Boy Advance
                 </span>,
 <span class="data">
                  Dec 8, 2022
                 </span>,
 <span class="data">
 <a href="/game/pc">
                   PC
                  </a>
"""

soup = BeautifulSoup(html_doc, "html.parser")

for span in soup.select("span.data:not(:has(*))"):
    print(span.get_text(strip=True))

打印：

PlayStation 3
Game Boy Advance
Dec 8, 2022

Python BeautifulSoup: 排除 in select 语句中的其他标签

Python BeautifulSoup: Excluding other tags in in select statement

beautifulsoup

css-selectors

python-3.x