使用 BeautifulSoup 在 HTML 文件中查找第一个标签

Question

我有一组 HTML 个文件，我想在每个文件中提取第一个标签。由于文件没有特定的标签，该标签始终位于文件的第一个，我不知道该怎么做。

例如，对于以下代码段，第一个标记为 <html>。

<html>
 <head>
    <title>
     insert title here
    </title>
 </head>
</html>

有什么方法可以使用 BeautifulSoup（或可能是其他工具）来完成此操作？提前致谢:)

Answer 1

在这种情况下您可以使用 BeautifulSoup，只需发出 find() on a BeautifulSoup object - it would find the first element in the tree. .name 即可为您提供标签名称：

from bs4 import BeautifulSoup

data = """
<html>
 <head>
    <title>
     insert title here
    </title>
 </head>
</html>
"""

soup = BeautifulSoup(data, "html.parser")
print(soup.find().name)

使用 BeautifulSoup 在 HTML 文件中查找第一个标签

Finding first tag in HTML file with BeautifulSoup

python

beautifulsoup

bs4