XML : 删除标签但保留文本

XML : remove tag but keep text

我有一个很大的 XML 文件,如下所示:

<corpus>
  <dialogue speaker="A">
    <sentence tag1="a" tag2="b"> Hello </sentence>
  </dialogue>
  <dialogue speaker="B">
    <sentence tag1="cc" tag2= "dd"> How are you </sentence>
    <sentence tag1="ff" tag2= "e"> today </sentence>
  </dialogue>
  <dialogue speaker="A">
    <sentence tag1="d" tag2= "bbb"> Great </sentence>
    <sentence tag1="f" tag2= "dd"> How about you </sentence>
  </dialogue>
  <dialogue speaker="B">
    <sentence tag1="a" tag2= "dd"> me too </sentence>
  </dialogue>
</corpus>

我需要删除子元素标签,这样零散的文本又会在父元素下变成完整的,输出如下所示:

<corpus>
  <dialogue speaker="A">
    Hello
  </dialogue>
  <dialogue speaker="B">
    How are you today
  </dialogue>
  <dialogue speaker="A">
    Great How about you
  </dialogue>
  <dialogue speaker="B">
     me too
  </dialogue>
</corpus>

我试过 element.strip()element.tag.strip() 但没有输出...这是我的代码:

f = ET.parse("file.xml")
root = f.getroot()

for s in root.findall("sentence"):
    text = s.tag.strip("sentence")
    print(text)

我做错了什么? 谢谢大家的帮助!!

你快到了。要获得输出,请尝试:

for d in root.findall(".//dialogue"):
        for s in d.findall('.//sentence'):
            if s.text:          
                new_t = s.text.strip()
            d.remove(s)
            d.text=new_t
print(ET.tostring(root).decode())

这应该会输出您需要的内容。