读取带有嵌套标签的 xml

Reading an xml with nested tags

我有一个 xml 文件,看起来像这样

<?xml version="1.0"?>
   This is an example of text <bold>just as everything else I write</bold>,
   this is some follow-up text that is hidden for eternity.
   This is more text with an <italic>strange</italic> example.


当我使用 xml.etree.ElementTreegetroot() 的函数解析时,我得到两个 p children。当我在第一个 p children 中询问文本时,我得到“This is an example of text”。

如果我查看第一个 p 的 children,我会用粗体显示“就像我写的其他所有内容一样”。

但我找不到“,\n这是一些永久隐藏的 follow-up 文本。”

另一个 p children.




我很困惑,因为唯一的 children 似乎是粗斜体。我附上了一张带有代码的图片。


使用 itertext() 以获得 p

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0"?>
   This is an example of text <bold>just as everything else I write</bold>,
   this is some follow-up text that is hidden for eternity.
   This is more text with an <italic>strange</italic> example.

root = ET.fromstring(xml)
for p in root.findall('.//p'):
  print(' '.join(p.itertext()))


   This is an example of text  just as everything else I write ,
   this is some follow-up text that is hidden for eternity.

   This is more text with an  strange  example.