如何使用python中的ElementTree提取满足XML指定条件的字段
How to use ElementTree in python to extract fields that meet specified conditions for XML
我正在使用 xml.etree.ElementTree 将 xml 文件提取为新的 xml 文件。我正在按以下方式解析 xml 文件:
import xml.etree.ElementTree as ET
tree = ET.parse("./VD.xml")
root = tree.getroot()
这是我的 xml 文件:
<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos>
<Info vdid="N1" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="N3" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T78" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
</Infos>
</XML_Head>
如何提取我想要的 vdid 或删除我不想要的 vdid?比如我想保留vdid=T74的组,预期的XML输出如下:
<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
</Infos>
</XML_Head>
谢谢!
您可以将要保留的 vdid
(s) 存储在一组中,然后浏览 xml 文件并删除不需要的文件:
import xml.etree.ElementTree as ET
tree = ET.parse("./VD.xml")
root = tree.getroot()
vdid_to_keep = {"T74"}
infos = root.find("Infos")
for info_tag in infos.findall('Info'):
if info_tag.get("vdid") not in vdid_to_keep:
infos.remove(info_tag)
tree.write("./output.xml")
output.xml:
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
</Info>
</Infos>
</XML_Head>
使用 xpath
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos>
<Info vdid="N1" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="N3" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T78" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
</Infos>
</XML_Head>'''
root = ET.fromstring(xml)
info_sub_list = root.findall('.//Info[@vdid="T74"]')
infos = root.find('.//Infos')
infos.clear()
infos.extend(info_sub_list)
ET.dump(root)
输出
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos><Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
</Info>
</Infos></XML_Head>
我正在使用 xml.etree.ElementTree 将 xml 文件提取为新的 xml 文件。我正在按以下方式解析 xml 文件:
import xml.etree.ElementTree as ET
tree = ET.parse("./VD.xml")
root = tree.getroot()
这是我的 xml 文件:
<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos>
<Info vdid="N1" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="N3" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T78" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
</Infos>
</XML_Head>
如何提取我想要的 vdid 或删除我不想要的 vdid?比如我想保留vdid=T74的组,预期的XML输出如下:
<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
</Infos>
</XML_Head>
谢谢!
您可以将要保留的 vdid
(s) 存储在一组中,然后浏览 xml 文件并删除不需要的文件:
import xml.etree.ElementTree as ET
tree = ET.parse("./VD.xml")
root = tree.getroot()
vdid_to_keep = {"T74"}
infos = root.find("Infos")
for info_tag in infos.findall('Info'):
if info_tag.get("vdid") not in vdid_to_keep:
infos.remove(info_tag)
tree.write("./output.xml")
output.xml:
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
</Info>
</Infos>
</XML_Head>
使用 xpath
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="utf-8"?>
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos>
<Info vdid="N1" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="N3" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T78" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21"></lane>
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23"></lane>
</Info>
</Infos>
</XML_Head>'''
root = ET.fromstring(xml)
info_sub_list = root.findall('.//Info[@vdid="T74"]')
infos = root.find('.//Infos')
infos.clear()
infos.extend(info_sub_list)
ET.dump(root)
输出
<XML_Head version="1.1" listname="VD" updatetime="2021/11/19 17:08:00" interval="300">
<Infos><Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
</Info>
<Info vdid="T74" status="0" datacollecttime="2021/11/19 17:00:00">
<lane vsrdir="0" vsrid="1" speed="54" laneoccupy="21" />
<lane vsrdir="0" vsrid="2" speed="49" laneoccupy="23" />
</Info>
</Infos></XML_Head>