用 Beautifulsoup 替换 p 标签的内容
Replace contents of p tag with Beautifulsoup
我 运行 遇到替换内容的问题,当 html 包含类似以下内容时会出现问题:
<p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>
我想做的是替换 p 标签的内容,同时丢弃任何其他样式或内部标签。在此示例中,这意味着强标签将不再是新字符串的一部分。
但是,我发现不可能完全替换 p 标签的内容。我用谷歌搜索了我的 problem/errors,但没能想出一个可行的例子。
这是我的代码和我尝试过的测试 运行,有些会抛出错误,有些则什么都不做。您可以取消引用其中任何一个以自行测试,但结果已附加在评论中。
from bs4 import BeautifulSoup
src = "<p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>"
soup=BeautifulSoup(src, "lxml")
for element in soup.findAll():
if element.name == 'p':
print(element)
#= <p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>
print(element.text)
#= Next, go to your /home/pi/ directory and check if you can see the picture
print(element.contents)
#= ['Next, go to your ', <strong>/home/pi</strong>, ' directory and check if you can see the picture']
# -- test 1:
# element.string.replace_with("First, go to your /home/pi directory")
# AttributeError: 'NoneType' object has no attribute 'replace_with'
# -- test 2:
# element.replace("First, go to your /home/pi directory")
# TypeError: 'NoneType' object is not callable
# -- test 3:
# new_tag = soup.new_tag('li')
# new_tag.string = "First, go to your /home/pi directory"
# element.replace_with(new_tag)
# print(element)
# not replaced
# -- test 4:
# element.text.replace(str(element), "First, go to your /home/pi directory")
# print(element)
# not replaced
# -- test 5:
# element.text.replace(element.text, "First go to your /home/pi/ directory")
# print(element)
# not replaced
# -- test 6:
new_tag = soup.new_tag('li')
new_tag.string = "First, go to your /home/pi directory"
element.replaceWith(new_tag)
print(element)
# not replaced
# -- test 7:
# element.replace_with("First, go to your /home/pi directory")
# print(element)
# not replaced
我怀疑问题是由于 element.contents
包含多项。但是,element.text
为我提供了处理字符串和替换它所需的内容,我不关心内部的任何样式。
作为最后的手段,我将接受 str.replace
格式 html 中的元素,但如果可能的话,我宁愿在 BeautifulSoup 中处理它。
使用的来源:
https://www.tutorialfor.com/questions-59179.htm https://beautiful-soup-4.readthedocs.io/en/latest/#modifying-the-tree https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup https://www.crummy.com/software/BeautifulSoup/bs4/doc/#replace-with https://www.crummy.com/software/BeautifulSoup/bs4/doc/#method-names
AttributeError: 'NoneType' object has no attribute 'replace_with'
我认为您可以简单地用 =
声明 element.string
。无需使用 .replace()
from bs4 import BeautifulSoup
src = "<p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>"
soup=BeautifulSoup(src, "html.parser")
print ('Original: %s' %soup)
for element in soup.findAll():
if element.name == 'p':
element.string = "First, go to your /home/pi directory"
print('Altered: %s' %soup)
输出:
Original: <p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>
Altered: <p>First, go to your /home/pi directory</p>
我 运行 遇到替换内容的问题,当 html 包含类似以下内容时会出现问题:
<p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>
我想做的是替换 p 标签的内容,同时丢弃任何其他样式或内部标签。在此示例中,这意味着强标签将不再是新字符串的一部分。
但是,我发现不可能完全替换 p 标签的内容。我用谷歌搜索了我的 problem/errors,但没能想出一个可行的例子。
这是我的代码和我尝试过的测试 运行,有些会抛出错误,有些则什么都不做。您可以取消引用其中任何一个以自行测试,但结果已附加在评论中。
from bs4 import BeautifulSoup
src = "<p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>"
soup=BeautifulSoup(src, "lxml")
for element in soup.findAll():
if element.name == 'p':
print(element)
#= <p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>
print(element.text)
#= Next, go to your /home/pi/ directory and check if you can see the picture
print(element.contents)
#= ['Next, go to your ', <strong>/home/pi</strong>, ' directory and check if you can see the picture']
# -- test 1:
# element.string.replace_with("First, go to your /home/pi directory")
# AttributeError: 'NoneType' object has no attribute 'replace_with'
# -- test 2:
# element.replace("First, go to your /home/pi directory")
# TypeError: 'NoneType' object is not callable
# -- test 3:
# new_tag = soup.new_tag('li')
# new_tag.string = "First, go to your /home/pi directory"
# element.replace_with(new_tag)
# print(element)
# not replaced
# -- test 4:
# element.text.replace(str(element), "First, go to your /home/pi directory")
# print(element)
# not replaced
# -- test 5:
# element.text.replace(element.text, "First go to your /home/pi/ directory")
# print(element)
# not replaced
# -- test 6:
new_tag = soup.new_tag('li')
new_tag.string = "First, go to your /home/pi directory"
element.replaceWith(new_tag)
print(element)
# not replaced
# -- test 7:
# element.replace_with("First, go to your /home/pi directory")
# print(element)
# not replaced
我怀疑问题是由于 element.contents
包含多项。但是,element.text
为我提供了处理字符串和替换它所需的内容,我不关心内部的任何样式。
作为最后的手段,我将接受 str.replace
格式 html 中的元素,但如果可能的话,我宁愿在 BeautifulSoup 中处理它。
使用的来源:
https://www.tutorialfor.com/questions-59179.htm https://beautiful-soup-4.readthedocs.io/en/latest/#modifying-the-tree https://www.crummy.com/software/BeautifulSoup/bs4/doc/#making-the-soup https://www.crummy.com/software/BeautifulSoup/bs4/doc/#replace-with https://www.crummy.com/software/BeautifulSoup/bs4/doc/#method-names
AttributeError: 'NoneType' object has no attribute 'replace_with'
我认为您可以简单地用 =
声明 element.string
。无需使用 .replace()
from bs4 import BeautifulSoup
src = "<p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>"
soup=BeautifulSoup(src, "html.parser")
print ('Original: %s' %soup)
for element in soup.findAll():
if element.name == 'p':
element.string = "First, go to your /home/pi directory"
print('Altered: %s' %soup)
输出:
Original: <p>Next, go to your <strong>/home/pi</strong> directory and check if you can see the picture</p>
Altered: <p>First, go to your /home/pi directory</p>