使用 BeautifulSoup 更新/添加嵌套数据
Update/ Add Nested Data using BeautifulSoup
我正在与 BeautifulSoup/Python 一起分析 HTML 页面并根据需要更新内容。我的 HTML 页面结构的虚拟结构如下:
<div class="main">
<div class="class_1">
<p><br/></p>
<div class="panel">Some content here </div>
<div class="panel">Another content here </div>
</div>
</div>
我想更新<div class="class_1">
的内容。
我能够成功地使用 BeautifulSoup 解析器来获取 <div class="class_1">
的内容。我还可以将我想要的新数据保存在我的 HTML 页面中,如下所示:
['<div class="panel">Some content here </div>',
'<div class="panel">Updated new content here </div>',
'<div class="panel">Hello new div here! </div>']
如何获得以下内容?我尝试了 replace_with 但它用 <
替换了 <
这是不可取的而且我对 Beautiful soup 不太熟悉所以不确定还有哪些其他选项可以帮助我实现正在关注。
<div class="main">
<div class="class_1">
<p><br/></p>
<div class="panel">Some content here </div>
<div class="panel">Updated new content here </div>
<div class="panel">Hello new div here! </div>
</div>
</div>
尝试:
from bs4 import BeautifulSoup
html_doc = """
<div class="main">
<div class="class_1">
<p><br/></p>
<div class="panel">Some content here </div>
<div class="panel">Another content here </div>
</div>
</div>
"""
new_content = [
'<div class="panel">Some content here </div>',
'<div class="panel">Updated new content here </div>',
'<div class="panel">Hello new div here! </div>',
]
soup = BeautifulSoup(html_doc, "html.parser")
# locate the correct <p> element:
p = soup.select_one(".class_1 p")
# delete old content:
# tags:
for t in p.find_next_siblings():
t.extract()
# text (if any):
for t in p.find_next_siblings(text=True):
t.extract()
# place new content:
p.insert_after(BeautifulSoup("\n" + "\n".join(new_content) + "\n", "html.parser"))
print(soup)
打印:
<div class="main">
<div class="class_1">
<p><br/></p>
<div class="panel">Some content here </div>
<div class="panel">Updated new content here </div>
<div class="panel">Hello new div here! </div>
</div>
</div>
我正在与 BeautifulSoup/Python 一起分析 HTML 页面并根据需要更新内容。我的 HTML 页面结构的虚拟结构如下:
<div class="main">
<div class="class_1">
<p><br/></p>
<div class="panel">Some content here </div>
<div class="panel">Another content here </div>
</div>
</div>
我想更新<div class="class_1">
的内容。
我能够成功地使用 BeautifulSoup 解析器来获取 <div class="class_1">
的内容。我还可以将我想要的新数据保存在我的 HTML 页面中,如下所示:
['<div class="panel">Some content here </div>',
'<div class="panel">Updated new content here </div>',
'<div class="panel">Hello new div here! </div>']
如何获得以下内容?我尝试了 replace_with 但它用 <
替换了 <
这是不可取的而且我对 Beautiful soup 不太熟悉所以不确定还有哪些其他选项可以帮助我实现正在关注。
<div class="main">
<div class="class_1">
<p><br/></p>
<div class="panel">Some content here </div>
<div class="panel">Updated new content here </div>
<div class="panel">Hello new div here! </div>
</div>
</div>
尝试:
from bs4 import BeautifulSoup
html_doc = """
<div class="main">
<div class="class_1">
<p><br/></p>
<div class="panel">Some content here </div>
<div class="panel">Another content here </div>
</div>
</div>
"""
new_content = [
'<div class="panel">Some content here </div>',
'<div class="panel">Updated new content here </div>',
'<div class="panel">Hello new div here! </div>',
]
soup = BeautifulSoup(html_doc, "html.parser")
# locate the correct <p> element:
p = soup.select_one(".class_1 p")
# delete old content:
# tags:
for t in p.find_next_siblings():
t.extract()
# text (if any):
for t in p.find_next_siblings(text=True):
t.extract()
# place new content:
p.insert_after(BeautifulSoup("\n" + "\n".join(new_content) + "\n", "html.parser"))
print(soup)
打印:
<div class="main">
<div class="class_1">
<p><br/></p>
<div class="panel">Some content here </div>
<div class="panel">Updated new content here </div>
<div class="panel">Hello new div here! </div>
</div>
</div>