如何使用python中的ElementTree提取满足XML指定条件的字段

How to use ElementTree in python to extract fields that meet specified conditions for XML

我正在使用 xml.etree.ElementTree 将 xml 文件提取为新的 xml 文件。我正在按以下方式解析 xml 文件:

import xml.etree.ElementTree as ET
tree = ET.parse("./VD.xml")
root = tree.getroot()

这是我的 xml 文件:

<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
    <Infos>
        <Info vdid="N1" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="N3" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="T78" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
    </Infos>
</XML_Head>

如何提取我想要的 vdid 或删除我不想要的 vdid?比如我想保留vdid=T74的组,预期的XML输出如下:

<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
    <Infos>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
    </Infos>
</XML_Head>

谢谢!

您可以将要保留的 vdid(s) 存储在一组中,然后浏览 xml 文件并删除不需要的文件:

import xml.etree.ElementTree as ET
tree = ET.parse("./VD.xml")
root = tree.getroot()

vdid_to_keep = {"T74"}

infos = root.find("Infos")
for info_tag in infos.findall('Info'):
    if info_tag.get("vdid") not in vdid_to_keep:
        infos.remove(info_tag)

tree.write("./output.xml")

output.xml:

<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
    <Infos>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
        </Info>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
        </Info>
    </Infos>
</XML_Head>

使用 xpath

import xml.etree.ElementTree as ET


xml = '''<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
    <Infos>
        <Info vdid="N1" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="N3" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="T78" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
        </Info>
    </Infos>
</XML_Head>'''

root = ET.fromstring(xml)
info_sub_list = root.findall('.//Info[@vdid="T74"]')

infos = root.find('.//Infos')
infos.clear()
infos.extend(info_sub_list)
ET.dump(root)

输出

<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
    <Infos><Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
        </Info>
        <Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
            <lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
            <lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
        </Info>
    </Infos></XML_Head>