XML : 删除标签但保留文本
XML : remove tag but keep text
我有一个很大的 XML 文件,如下所示:
<corpus>
<dialogue speaker="A">
<sentence tag1="a" tag2="b"> Hello </sentence>
</dialogue>
<dialogue speaker="B">
<sentence tag1="cc" tag2= "dd"> How are you </sentence>
<sentence tag1="ff" tag2= "e"> today </sentence>
</dialogue>
<dialogue speaker="A">
<sentence tag1="d" tag2= "bbb"> Great </sentence>
<sentence tag1="f" tag2= "dd"> How about you </sentence>
</dialogue>
<dialogue speaker="B">
<sentence tag1="a" tag2= "dd"> me too </sentence>
</dialogue>
</corpus>
我需要删除子元素标签,这样零散的文本又会在父元素下变成完整的,输出如下所示:
<corpus>
<dialogue speaker="A">
Hello
</dialogue>
<dialogue speaker="B">
How are you today
</dialogue>
<dialogue speaker="A">
Great How about you
</dialogue>
<dialogue speaker="B">
me too
</dialogue>
</corpus>
我试过 element.strip()
和 element.tag.strip()
但没有输出...这是我的代码:
f = ET.parse("file.xml")
root = f.getroot()
for s in root.findall("sentence"):
text = s.tag.strip("sentence")
print(text)
我做错了什么?
谢谢大家的帮助!!
你快到了。要获得输出,请尝试:
for d in root.findall(".//dialogue"):
for s in d.findall('.//sentence'):
if s.text:
new_t = s.text.strip()
d.remove(s)
d.text=new_t
print(ET.tostring(root).decode())
这应该会输出您需要的内容。
我有一个很大的 XML 文件,如下所示:
<corpus>
<dialogue speaker="A">
<sentence tag1="a" tag2="b"> Hello </sentence>
</dialogue>
<dialogue speaker="B">
<sentence tag1="cc" tag2= "dd"> How are you </sentence>
<sentence tag1="ff" tag2= "e"> today </sentence>
</dialogue>
<dialogue speaker="A">
<sentence tag1="d" tag2= "bbb"> Great </sentence>
<sentence tag1="f" tag2= "dd"> How about you </sentence>
</dialogue>
<dialogue speaker="B">
<sentence tag1="a" tag2= "dd"> me too </sentence>
</dialogue>
</corpus>
我需要删除子元素标签,这样零散的文本又会在父元素下变成完整的,输出如下所示:
<corpus>
<dialogue speaker="A">
Hello
</dialogue>
<dialogue speaker="B">
How are you today
</dialogue>
<dialogue speaker="A">
Great How about you
</dialogue>
<dialogue speaker="B">
me too
</dialogue>
</corpus>
我试过 element.strip()
和 element.tag.strip()
但没有输出...这是我的代码:
f = ET.parse("file.xml")
root = f.getroot()
for s in root.findall("sentence"):
text = s.tag.strip("sentence")
print(text)
我做错了什么? 谢谢大家的帮助!!
你快到了。要获得输出,请尝试:
for d in root.findall(".//dialogue"):
for s in d.findall('.//sentence'):
if s.text:
new_t = s.text.strip()
d.remove(s)
d.text=new_t
print(ET.tostring(root).decode())
这应该会输出您需要的内容。