Python bs4 删除 br 标签
Python bs4 removes br tag
我使用 bs4 来处理一些富文本。但它删除了我进行字符转换的地方的 br 标签。下面是代码的简单形式。
import re
from bs4 import BeautifulSoup
#source_code = self.textInput.toHtml()
source_code = """.......<p style=" margin-top:12px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"><span style=" font-family:'Ubuntu';">ABC ABC<br />ABC</span></p>......."""
soup = BeautifulSoup(source_code, "lxml")
for elm in soup.find_all('span', style=re.compile(r"font-family:'Ubuntu'")):
#actually there was a for loop
elm.string = elm.text.replace("A", "X")
elm.string = elm.text.replace("B", "Y")
elm.string = elm.text.replace("C", "Z")
print(soup.prettify())
这应该给出一个输出
...<span style=" font-family:'Ubuntu';">XYZ XYZ<br />XYZ</span>...
#XYZ XYZ
#XYZ
但它给出的输出没有 br 标签。
...<span style=" font-family:'Ubuntu';">XYZ XYZXYZ</span>...
#XYZ XYZXYZ
我该如何纠正?
问题是您正在重新定义元素的 .string
,但我会找到 "text" 节点并在那里进行替换:
for text in elm.find_all(text=True):
text.replace_with(text.replace("A", "X").replace("B", "Y").replace("C", "Z"))
为我工作,产生:
</p>
<p style=" margin-top:12px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">
<span style=" font-family:'Ubuntu';">
XYZ XYZ
<br/>
XYZ
</span>
</p>
how can i include this part in a loop?
这是一个示例:
replacements = {
"A": "X",
"B": "Y",
"C": "Z"
}
for text in elm.find_all(text=True):
text_to_replace = text
for k, v in replacements.items():
text_to_replace = text_to_replace.replace(k, v)
text.replace_with(text_to_replace)
我使用 bs4 来处理一些富文本。但它删除了我进行字符转换的地方的 br 标签。下面是代码的简单形式。
import re
from bs4 import BeautifulSoup
#source_code = self.textInput.toHtml()
source_code = """.......<p style=" margin-top:12px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;"><span style=" font-family:'Ubuntu';">ABC ABC<br />ABC</span></p>......."""
soup = BeautifulSoup(source_code, "lxml")
for elm in soup.find_all('span', style=re.compile(r"font-family:'Ubuntu'")):
#actually there was a for loop
elm.string = elm.text.replace("A", "X")
elm.string = elm.text.replace("B", "Y")
elm.string = elm.text.replace("C", "Z")
print(soup.prettify())
这应该给出一个输出
...<span style=" font-family:'Ubuntu';">XYZ XYZ<br />XYZ</span>...
#XYZ XYZ
#XYZ
但它给出的输出没有 br 标签。
...<span style=" font-family:'Ubuntu';">XYZ XYZXYZ</span>...
#XYZ XYZXYZ
我该如何纠正?
问题是您正在重新定义元素的 .string
,但我会找到 "text" 节点并在那里进行替换:
for text in elm.find_all(text=True):
text.replace_with(text.replace("A", "X").replace("B", "Y").replace("C", "Z"))
为我工作,产生:
</p>
<p style=" margin-top:12px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px;">
<span style=" font-family:'Ubuntu';">
XYZ XYZ
<br/>
XYZ
</span>
</p>
how can i include this part in a loop?
这是一个示例:
replacements = {
"A": "X",
"B": "Y",
"C": "Z"
}
for text in elm.find_all(text=True):
text_to_replace = text
for k, v in replacements.items():
text_to_replace = text_to_replace.replace(k, v)
text.replace_with(text_to_replace)