Select XML 内含特定标签的标签

Select XML tags that have certain tags inside them

我正在使用 ElementTree 解析带有 Python 的 xml 文件,我需要像下面的 <PostalAddressText> 这样的 select 标签,其中有像 <insert> 这样的标签在他们里面。我怎么做?我需要获取 xml 文件中满足条件的所有相关标签名称的列表。

这是我正在解析的 XML 片段(除了内部标签之外的实际文本已替换为 Lorem Ipsum):

<?xml version="1.0"?>
<data>
    <PostalAddressText>123456,
    <insert>
</insert>Lorem ipsum dolor sit amet, <insert>
</insert>consectetur adipiscing elit.<insert>
</insert>Etiam cursus ligula non malesuada fringilla.<delete> </delete><insert>
</insert>Quisque porta quam eu finibus pulvinar.<delete>or</delete><insert>er</insert> Mauris at semper urna.<delete>a</delete><insert>o</insert> Donec feugiat<delete>arcu purus</delete><insert>et lacinia</insert></PostalAddressText>
    <PersonNameText>789012,
    <insert>
</insert>Lorem ipsum dolor sit amet, <insert>
</insert>consectetur adipiscing elit.<insert>
</insert>Etiam cursus ligula non malesuada fringilla.<delete> </delete><insert>
</insert>Quisque porta quam eu finibus pulvinar.<delete>or</delete><insert>er</insert> Mauris at semper urna.<delete>a</delete><insert>o</insert> Donec feugiat<delete>arcu purus</delete><insert>et lacinia</insert>
    </PersonNameText>
</data>

我已经试过了,但没有任何内容打印到控制台:

test_data = root.findall(".//")
for el in test_data:
    if el.text == '*<insert>*':
        print(el.tag, el.text)

不确定您的预期输出是什么,但请参阅下面的代码和输出

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0"?>
<data>
    <PostalAddressText>123456,
    <insert>
</insert>Lorem ipsum dolor sit amet, <insert>
</insert>consectetur adipiscing elit.<insert>
</insert>Etiam cursus ligula non malesuada fringilla.<delete> </delete><insert>
</insert>Quisque porta quam eu finibus pulvinar.<delete>or</delete><insert>er</insert> Mauris at semper urna.<delete>a</delete><insert>o</insert> Donec feugiat<delete>arcu purus</delete><insert>et lacinia</insert></PostalAddressText>
    <PersonNameText>789012,
    <insert>
</insert>Lorem ipsum dolor sit amet, <insert>
</insert>consectetur adipiscing elit.<insert>
</insert>Etiam cursus ligula non malesuada fringilla.<delete> </delete><insert>
</insert>Quisque porta quam eu finibus pulvinar.<delete>or</delete><insert>er</insert> Mauris at semper urna.<delete>a</delete><insert>o</insert> Donec feugiat<delete>arcu purus</delete><insert>et lacinia</insert>
    </PersonNameText>
</data>'''

root = ET.fromstring(xml)
for idx, insert in enumerate(root.findall('.//insert'), 1):
    print(f'{idx}) {insert.text}  {insert.tail}')

输出

1) 
  Lorem ipsum dolor sit amet, 
2) 
  consectetur adipiscing elit.
3) 
  Etiam cursus ligula non malesuada fringilla.
4) 
  Quisque porta quam eu finibus pulvinar.
5) er   Mauris at semper urna.
6) o   Donec feugiat
7) et lacinia  None
8) 
  Lorem ipsum dolor sit amet, 
9) 
  consectetur adipiscing elit.
10) 
  Etiam cursus ligula non malesuada fringilla.
11) 
  Quisque porta quam eu finibus pulvinar.
12) er   Mauris at semper urna.
13) o   Donec feugiat
14) et lacinia  

我用下面的代码得到了我需要的结果(我的想法来自@balderman):

for el in root.findall('.//*[insert]'): 
    print(el.tag)

哪个给了我标签的名字。