去除 Beautifulsoup 中的第一个（顶级）标签

Question

我做汤：

from bs4 import BeautifulSoup
soup = BeautifulSoup("<div><p>My paragraph <a>My link</a></p></div>","html.parser")

我想剥离第一个 top-level 标签以显示其内容，而不考虑标签：

<p>My paragraph<a>My link</a></p>

所有 children。所以我不想用 soup.find("div") 之类的标签查找和替换，而是按位置执行此操作。

如何做到这一点？

Answer 1

也许你可以用它 children?

soup.findChildren()[1] -> <p>My paragraph <a>My link</a></p>

soup.findChildren()[0] returns 包含 div 元素的元素本身。所以索引 1 将是第一个 child.

Answer 2

使用提供的.unwrap()函数：

from bs4 import BeautifulSoup
soup = BeautifulSoup("<div><p>My paragraph <a>My link</a></p><p>hello again</p></div>","html.parser")

soup.contents[0].unwrap()

print soup
print len(soup.contents)

结果：

<p>My paragraph <a>My link</a></p><p>hello again</p>
2

去除 Beautifulsoup 中的第一个（顶级）标签

Strip the first (top level) tag in Beautifulsoup

html

python

tags

strip

beautifulsoup