使用 BeautifulSoup 从 html 中查找文本

Question

我有以下内容。html:

<li class="print text">
                            <span><em class="time">
                                    <div class="time">1.29 s</div>
                                </em><em class="status">passed</em>This is the text I want to get</span>

我只需要获取所有其他标签之外的文本（文本是：This is the text I want to get）。

我正在尝试使用这段代码：

for el in doc.find_all('li', attrs={'class': 'print text'}):
    print(el.get_text())

但不幸的是，它打印了所有内容，包括 em 标签等。

有什么办法吗？

谢谢！！

Answer 1

使用 class 查找特定的 li 标签，并在 em 标签上使用 find_all 方法，使用索引和 next-sibling 方法从列表中获取最后一个标签return 文字

from bs4 import BeautifulSoup
soup="""<li class="print text">
        <span><em class="time">
                <div class="time">1.29 s</div>
            </em><em class="status">passed</em>This is the text I want to get</span>"""

soup=BeautifulSoup(soup)
soup.find("li",class_="print text").find_all("em")[-1].next_sibling

Answer 2

您可以选择 find(text=True, recursive=False) 来实现您的目标。

例子

from bs4 import BeautifulSoup
soup='''<li class="print text">
        <span><em class="time">
                <div class="time">1.29 s</div>
            </em><em class="status">passed</em>This is the text I want to get</span>'''

soup=BeautifulSoup(soup)

soup.find('li',class_='print text').span.find(text=True, recursive=False)

输出

This is the text I want to get

如果您的 li 中有多个 span，您可以选择：

from bs4 import BeautifulSoup
soup='''<li class="print text">
        <span><em class="time">
                <div class="time">1.29 s</div>
            </em><em class="status">passed</em>This is the text I want to get</span>
            <span><em class="time">
                <div class="time">1.50 s</div>
            </em><em class="status">passed</em>This is the text I want to get too</span>'''

soup=BeautifulSoup(soup)

for e in soup.select('li.print.text span'):
    print(e.find(text=True, recursive=False))

输出

This is the text I want to get
This is the text I want to get too

使用 BeautifulSoup 从 html 中查找文本

Finding text from html using BeautifulSoup

html

python

beautifulsoup

web-scraping

例子

输出

输出