如何使用 Python ElementTree 在 XML 解析期间检查条件
How do I check condition during XML parsing using Python ElementTree
我正在尝试使用 ElementTree 解析 XML 并获取所有必填字段。
Problem :
My list is getting empty , condition that i am trying is- If
reference('type') == 'cve' then i want to get 'id' text in reference
tag.
有人可以 suggest/correct 我获取所需的字段吗?
我的实际代码如下:
import xml.etree.ElementTree as ET
file_name = "updateinfo.xml"
parser = ET.XMLParser(encoding="utf-8")
tree = ET.parse(file_name, parser=parser)
tree_toString = (ET.tostring(tree.getroot()))
for ele in tree.findall('update'):
cveList = [
ele.find('references/reference').get('id') if ele.find('references/reference').get('type') == 'cve' else None
for cve in ele.find('references/reference')]
print cveList
我的 XML 结构如下:
<?xml version="1.0" encoding="UTF-8"?>
<updates>
<update status="final" from="release-engineering@redhat.com" version="4" type="enhancement" >
<id>RHEA-2017:2259</id>
<issued date="2017-08-01 05:59:34 UTC" />
<title>new packages: usbguard</title>
<release>0</release>
<rights>Copyright 2017 Red Hat Inc</rights>
<pushcount>4</pushcount>
<updated date="2017-08-01 05:59:34 UTC" />
<references>
<reference href="https://access.redhat.com/errata/RHEA-2017:2259" type="self" id="RHEA-2017:2259" title="RHEA-2017:2259" />
<reference href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/index.html" type="other" id="ref_0" title="other_reference_0" />
</references>
<pkglist>
<collection short="" >
<name>rhel-7-server-rpms__7_DOT_4__x86_64</name>
<package src="usbguard-0.7.0-3.el7.src.rpm" name="usbguard" epoch="0" version="0.7.0" release="3.el7" arch="i686" >
<filename>usbguard-0.7.0-3.el7.i686.rpm</filename>
<sum type="sha256" >efd5ca6dd3df02e8537cf45cef48508bf023f568a98ce9f28e9baf77c5caac6c</sum>
</package>
<package src="usbguard-0.7.0-3.el7.src.rpm" name="usbguard" epoch="0" version="0.7.0" release="3.el7" arch="x86_64" >
<filename>usbguard-0.7.0-3.el7.x86_64.rpm</filename>
<sum type="sha256" >3f72768880085d6bfff37636d3a8eb54184e5619353b5efbefd5738e74bdfa08</sum>
</package>
</collection>
</pkglist>
</update>
<update status="final" from="security@redhat.com" version="1" type="bugfix" >
<id>RHBA-2014:0722</id>
<issued date="2014-06-10 00:00:00" />
<title>kexec-tools bug fix update</title>
<rights>Copyright 2014 Red Hat Inc</rights>
<pushcount>1</pushcount>
<updated date="2014-06-10 00:00:00" />
<references>
<reference href="https://rhn.redhat.com/errata/RHBA-2014-0722.html" type="self" title="RHBA-2014:0722" />
</references>
<pkglist>
<collection short="" >
<name>rhel-7-server-rpms__7_DOT_4__x86_64</name>
<package src="kexec-tools-2.0.4-32.el7_0.1.src.rpm" name="kexec-tools" epoch="0" version="2.0.4" release="32.el7_0.1" arch="x86_64" >
<filename>kexec-tools-2.0.4-32.el7_0.1.x86_64.rpm</filename>
<sum type="sha256" >8e214681104e4ba73726e0ce11d21b963ec0390fd70458d439ddc72372082034</sum>
</package>
</collection>
</pkglist>
</update>
<update status="final" from="release-engineering@redhat.com" version="4" type="security" >
<id>RHSA-2017:2831</id>
<issued date="2017-09-28 18:56:55 UTC" />
<title>Critical: firefox security update</title>
<release>0</release>
<rights>Copyright 2017 Red Hat Inc</rights>
<severity>Critical</severity>
<pushcount>4</pushcount>
<updated date="2017-09-28 18:56:56 UTC" />
<references>
<reference href="https://access.redhat.com/errata/RHSA-2017:2831" type="self" id="RHSA-2017:2831" title="RHSA-2017:2831" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1496649" type="bugzilla" id="1496649" title="CVE-2017-7793 Mozilla: Use-after-free with Fetch API (MFSA 2017-22)" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1496651" type="bugzilla" id="1496651" title="CVE-2017-7810 Mozilla: Memory safety bugs fixed in Firefox 56 and Firefox ESR 52.4 (MFSA 2017-22)" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1496652" type="bugzilla" id="1496652" title="CVE-2017-7814 Mozilla: Blob and data URLs bypass phishing and malware protection warnings (MFSA 2017-22)" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1496653" type="bugzilla" id="1496653" title="CVE-2017-7818 Mozilla: Use-after-free during ARIA array manipulation (MFSA 2017-22)" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1496654" type="bugzilla" id="1496654" title="CVE-2017-7819 Mozilla: Use-after-free while resizing images in design mode (MFSA 2017-22)" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1496655" type="bugzilla" id="1496655" title="CVE-2017-7823 Mozilla: CSP sandbox directive did not create a unique origin (MFSA 2017-22)" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1496656" type="bugzilla" id="1496656" title="CVE-2017-7824 Mozilla: Buffer overflow when drawing and validating elements with ANGLE (MFSA 2017-22)" />
<reference href="https://www.redhat.com/security/data/cve/CVE-2017-7793.html" type="cve" id="CVE-2017-7793" title="CVE-2017-7793" />
<reference href="https://www.redhat.com/security/data/cve/CVE-2017-7810.html" type="cve" id="CVE-2017-7810" title="CVE-2017-7810" />
<reference href="https://www.redhat.com/security/data/cve/CVE-2017-7814.html" type="cve" id="CVE-2017-7814" title="CVE-2017-7814" />
<reference href="https://www.redhat.com/security/data/cve/CVE-2017-7818.html" type="cve" id="CVE-2017-7818" title="CVE-2017-7818" />
<reference href="https://www.redhat.com/security/data/cve/CVE-2017-7819.html" type="cve" id="CVE-2017-7819" title="CVE-2017-7819" />
<reference href="https://www.redhat.com/security/data/cve/CVE-2017-7823.html" type="cve" id="CVE-2017-7823" title="CVE-2017-7823" />
<reference href="https://www.redhat.com/security/data/cve/CVE-2017-7824.html" type="cve" id="CVE-2017-7824" title="CVE-2017-7824" />
<reference href="https://access.redhat.com/security/updates/classification/#critical" type="other" id="classification" title="critical" />
<reference href="https://www.mozilla.org/en-US/security/advisories/mfsa2017-22/" type="other" id="ref_0" title="other_reference_0" />
</references>
<pkglist>
<collection short="" >
<name>rhel-7-server-rpms__7_DOT_4__x86_64</name>
<package src="firefox-52.4.0-1.el7_4.src.rpm" name="firefox" epoch="0" version="52.4.0" release="1.el7_4" arch="x86_64" >
<filename>firefox-52.4.0-1.el7_4.x86_64.rpm</filename>
<sum type="sha256" >7b81b37bf969534bee0152bc13db56ae410eee06120a78d8da261c10c73c0514</sum>
</package>
</collection>
</pkglist>
</update>
<update status="final" from="release-engineering@redhat.com" version="2" type="bugfix" >
<id>RHBA-2016:2423</id>
<issued date="2016-11-03 06:09:21 UTC" />
<title>oscap-anaconda-addon bug fix update</title>
<release>0</release>
<rights>Copyright 2016 Red Hat Inc</rights>
<severity>None</severity>
<pushcount>2</pushcount>
<updated date="2016-11-03 06:10:44 UTC" />
<references>
<reference href="https://access.redhat.com/errata/RHBA-2016:2423" type="self" id="RHBA-2016:2423" title="RHBA-2016:2423" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1269211" type="bugzilla" id="1269211" title="could move security section down to bottom since it's not as important as network spoke" />
</references>
<pkglist>
<collection short="" >
<name>rhel-7-server-rpms__7_DOT_4__x86_64</name>
<package src="oscap-anaconda-addon-0.7-12.el7.src.rpm" name="oscap-anaconda-addon" epoch="0" version="0.7" release="12.el7" arch="noarch" >
<filename>oscap-anaconda-addon-0.7-12.el7.noarch.rpm</filename>
<sum type="sha256" >507fbf46ddaed0bb4087d3ef2b31db235473f3be36aaa9ed7df43279ed7e2f07</sum>
</package>
</collection>
</pkglist>
</update>
使用 ele.find(...).get(‘id’)
是不对的 - 使用 cve.find(‘id’)
而不是 ele.find(...).get(‘type’)
使用 cve.get(‘type’)
Question: How do I check condition during XML parsing
What you are doing, is not parsing, as this line has done the parsing already:
tree = ET.parse(file_name, parser=parser)
You don't need to pass parser=XMLParser
, as this is the standard Parser.
Read for Reference: xml.etree.ElementTree.parse
Your example code loops the ElementTree
FOUR TIMES.
for ele in tree.findall('update'):
cveList = [
ele.find('references/reference').get('id') if ele.find('references/reference').get('type') == 'cve' else None
for cve in ele.find('references/reference')]
Every .find...
, will loop until it findes the requested Element or up to the End.
You should avoid such nested coding!
你可以通过一个循环得到所有reference
个元素,例如:
import xml.etree.ElementTree as ET
file_name = "test/updateinfo.xml"
tree = ET.parse(file_name)
cveList = []
for reference in tree.findall('update/references/reference'):
if reference.attrib.get('type') == 'cve':
cveList.append(reference.attrib.get('id'))
print(cveList)
Output:
['CVE-2017-7793', 'CVE-2017-7810', 'CVE-2017-7814', 'CVE-2017-7818', 'CVE-2017-7819', 'CVE-2017-7823', 'CVE-2017-7824']
Comment: cveList for each update item instead of getting all items in one list.I would like to iterate in each update and get other attributes as well
# Findall 'update' Elements in tree
for update in tree.findall('update'):
# Findall 'references/reference' in update
for reference in update.findall('references/reference'):
if reference.attrib.get('type') == 'cve':
# Find Element with tag <title> in update
title = update.find('title').text
# Append a Dict with keys 'title' and 'id'
cveList.append({'title': title, 'id': reference.get('id')})
Output:
[{'id': 'CVE-2017-7793', 'title': 'Critical: firefox security update'}, {'id': 'CVE-2017-7810', 'title': 'Critical: firefox security update'}, {'id': 'CVE-2017-7814', 'title': 'Critical: firefox security update'}, {'id': 'CVE-2017-7818', 'title': 'Critical: firefox security update'}, {'id': 'CVE-2017-7819', 'title': 'Critical: firefox security update'}, {'id': 'CVE-2017-7823', 'title': 'Critical: firefox security update'}, {'id': 'CVE-2017-7824', 'title': 'Critical: firefox security update'}]
测试 Python:2.7.9
<?xml version="1.0" encoding="UTF-8"?>
<computer>
<extension_attributes>
<extension_attribute>
<id>8</id>
<name>user1</name>
<type>String</type>
<multi_value>false</multi_value>
<value>Installed</value>
</extension_attribute>
<extension_attribute>
<id>33</id>
<name>user2</name>
<type>String</type>
<multi_value>false</multi_value>
<value>Not Installed</value>
</extension_attribute>
</extension_attributes>
import requests
import xml.etree.cElementTree
get_url = "<https://some.url.com/extension_attributes>"
headers = {'Accept': 'application/xml', 'Content-Type':
'application/xml', 'authorization': 'Basic xxxxx'}
r = requests.get(get_url, headers=headers)
root = xml.etree.ElementTree.fromstring(r.text)
values = root.findall('extension_attributes/extension_attribute')
for val in values:
if val.find('id').text == '33':
print('Value', val.find('value').text)
我正在尝试使用 ElementTree 解析 XML 并获取所有必填字段。
Problem : My list is getting empty , condition that i am trying is- If reference('type') == 'cve' then i want to get 'id' text in reference tag.
有人可以 suggest/correct 我获取所需的字段吗?
我的实际代码如下:
import xml.etree.ElementTree as ET
file_name = "updateinfo.xml"
parser = ET.XMLParser(encoding="utf-8")
tree = ET.parse(file_name, parser=parser)
tree_toString = (ET.tostring(tree.getroot()))
for ele in tree.findall('update'):
cveList = [
ele.find('references/reference').get('id') if ele.find('references/reference').get('type') == 'cve' else None
for cve in ele.find('references/reference')]
print cveList
我的 XML 结构如下:
<?xml version="1.0" encoding="UTF-8"?>
<updates>
<update status="final" from="release-engineering@redhat.com" version="4" type="enhancement" >
<id>RHEA-2017:2259</id>
<issued date="2017-08-01 05:59:34 UTC" />
<title>new packages: usbguard</title>
<release>0</release>
<rights>Copyright 2017 Red Hat Inc</rights>
<pushcount>4</pushcount>
<updated date="2017-08-01 05:59:34 UTC" />
<references>
<reference href="https://access.redhat.com/errata/RHEA-2017:2259" type="self" id="RHEA-2017:2259" title="RHEA-2017:2259" />
<reference href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.4_Release_Notes/index.html" type="other" id="ref_0" title="other_reference_0" />
</references>
<pkglist>
<collection short="" >
<name>rhel-7-server-rpms__7_DOT_4__x86_64</name>
<package src="usbguard-0.7.0-3.el7.src.rpm" name="usbguard" epoch="0" version="0.7.0" release="3.el7" arch="i686" >
<filename>usbguard-0.7.0-3.el7.i686.rpm</filename>
<sum type="sha256" >efd5ca6dd3df02e8537cf45cef48508bf023f568a98ce9f28e9baf77c5caac6c</sum>
</package>
<package src="usbguard-0.7.0-3.el7.src.rpm" name="usbguard" epoch="0" version="0.7.0" release="3.el7" arch="x86_64" >
<filename>usbguard-0.7.0-3.el7.x86_64.rpm</filename>
<sum type="sha256" >3f72768880085d6bfff37636d3a8eb54184e5619353b5efbefd5738e74bdfa08</sum>
</package>
</collection>
</pkglist>
</update>
<update status="final" from="security@redhat.com" version="1" type="bugfix" >
<id>RHBA-2014:0722</id>
<issued date="2014-06-10 00:00:00" />
<title>kexec-tools bug fix update</title>
<rights>Copyright 2014 Red Hat Inc</rights>
<pushcount>1</pushcount>
<updated date="2014-06-10 00:00:00" />
<references>
<reference href="https://rhn.redhat.com/errata/RHBA-2014-0722.html" type="self" title="RHBA-2014:0722" />
</references>
<pkglist>
<collection short="" >
<name>rhel-7-server-rpms__7_DOT_4__x86_64</name>
<package src="kexec-tools-2.0.4-32.el7_0.1.src.rpm" name="kexec-tools" epoch="0" version="2.0.4" release="32.el7_0.1" arch="x86_64" >
<filename>kexec-tools-2.0.4-32.el7_0.1.x86_64.rpm</filename>
<sum type="sha256" >8e214681104e4ba73726e0ce11d21b963ec0390fd70458d439ddc72372082034</sum>
</package>
</collection>
</pkglist>
</update>
<update status="final" from="release-engineering@redhat.com" version="4" type="security" >
<id>RHSA-2017:2831</id>
<issued date="2017-09-28 18:56:55 UTC" />
<title>Critical: firefox security update</title>
<release>0</release>
<rights>Copyright 2017 Red Hat Inc</rights>
<severity>Critical</severity>
<pushcount>4</pushcount>
<updated date="2017-09-28 18:56:56 UTC" />
<references>
<reference href="https://access.redhat.com/errata/RHSA-2017:2831" type="self" id="RHSA-2017:2831" title="RHSA-2017:2831" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1496649" type="bugzilla" id="1496649" title="CVE-2017-7793 Mozilla: Use-after-free with Fetch API (MFSA 2017-22)" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1496651" type="bugzilla" id="1496651" title="CVE-2017-7810 Mozilla: Memory safety bugs fixed in Firefox 56 and Firefox ESR 52.4 (MFSA 2017-22)" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1496652" type="bugzilla" id="1496652" title="CVE-2017-7814 Mozilla: Blob and data URLs bypass phishing and malware protection warnings (MFSA 2017-22)" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1496653" type="bugzilla" id="1496653" title="CVE-2017-7818 Mozilla: Use-after-free during ARIA array manipulation (MFSA 2017-22)" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1496654" type="bugzilla" id="1496654" title="CVE-2017-7819 Mozilla: Use-after-free while resizing images in design mode (MFSA 2017-22)" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1496655" type="bugzilla" id="1496655" title="CVE-2017-7823 Mozilla: CSP sandbox directive did not create a unique origin (MFSA 2017-22)" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1496656" type="bugzilla" id="1496656" title="CVE-2017-7824 Mozilla: Buffer overflow when drawing and validating elements with ANGLE (MFSA 2017-22)" />
<reference href="https://www.redhat.com/security/data/cve/CVE-2017-7793.html" type="cve" id="CVE-2017-7793" title="CVE-2017-7793" />
<reference href="https://www.redhat.com/security/data/cve/CVE-2017-7810.html" type="cve" id="CVE-2017-7810" title="CVE-2017-7810" />
<reference href="https://www.redhat.com/security/data/cve/CVE-2017-7814.html" type="cve" id="CVE-2017-7814" title="CVE-2017-7814" />
<reference href="https://www.redhat.com/security/data/cve/CVE-2017-7818.html" type="cve" id="CVE-2017-7818" title="CVE-2017-7818" />
<reference href="https://www.redhat.com/security/data/cve/CVE-2017-7819.html" type="cve" id="CVE-2017-7819" title="CVE-2017-7819" />
<reference href="https://www.redhat.com/security/data/cve/CVE-2017-7823.html" type="cve" id="CVE-2017-7823" title="CVE-2017-7823" />
<reference href="https://www.redhat.com/security/data/cve/CVE-2017-7824.html" type="cve" id="CVE-2017-7824" title="CVE-2017-7824" />
<reference href="https://access.redhat.com/security/updates/classification/#critical" type="other" id="classification" title="critical" />
<reference href="https://www.mozilla.org/en-US/security/advisories/mfsa2017-22/" type="other" id="ref_0" title="other_reference_0" />
</references>
<pkglist>
<collection short="" >
<name>rhel-7-server-rpms__7_DOT_4__x86_64</name>
<package src="firefox-52.4.0-1.el7_4.src.rpm" name="firefox" epoch="0" version="52.4.0" release="1.el7_4" arch="x86_64" >
<filename>firefox-52.4.0-1.el7_4.x86_64.rpm</filename>
<sum type="sha256" >7b81b37bf969534bee0152bc13db56ae410eee06120a78d8da261c10c73c0514</sum>
</package>
</collection>
</pkglist>
</update>
<update status="final" from="release-engineering@redhat.com" version="2" type="bugfix" >
<id>RHBA-2016:2423</id>
<issued date="2016-11-03 06:09:21 UTC" />
<title>oscap-anaconda-addon bug fix update</title>
<release>0</release>
<rights>Copyright 2016 Red Hat Inc</rights>
<severity>None</severity>
<pushcount>2</pushcount>
<updated date="2016-11-03 06:10:44 UTC" />
<references>
<reference href="https://access.redhat.com/errata/RHBA-2016:2423" type="self" id="RHBA-2016:2423" title="RHBA-2016:2423" />
<reference href="https://bugzilla.redhat.com/show_bug.cgi?id=1269211" type="bugzilla" id="1269211" title="could move security section down to bottom since it's not as important as network spoke" />
</references>
<pkglist>
<collection short="" >
<name>rhel-7-server-rpms__7_DOT_4__x86_64</name>
<package src="oscap-anaconda-addon-0.7-12.el7.src.rpm" name="oscap-anaconda-addon" epoch="0" version="0.7" release="12.el7" arch="noarch" >
<filename>oscap-anaconda-addon-0.7-12.el7.noarch.rpm</filename>
<sum type="sha256" >507fbf46ddaed0bb4087d3ef2b31db235473f3be36aaa9ed7df43279ed7e2f07</sum>
</package>
</collection>
</pkglist>
</update>
使用 ele.find(...).get(‘id’)
是不对的 - 使用 cve.find(‘id’)
而不是 ele.find(...).get(‘type’)
使用 cve.get(‘type’)
Question: How do I check condition during XML parsing
What you are doing, is not parsing, as this line has done the parsing already:
tree = ET.parse(file_name, parser=parser)
You don't need to pass
parser=XMLParser
, as this is the standard Parser.
Read for Reference: xml.etree.ElementTree.parseYour example code loops the
ElementTree
FOUR TIMES.for ele in tree.findall('update'): cveList = [ ele.find('references/reference').get('id') if ele.find('references/reference').get('type') == 'cve' else None for cve in ele.find('references/reference')]
Every
.find...
, will loop until it findes the requested Element or up to the End.
You should avoid such nested coding!
你可以通过一个循环得到所有reference
个元素,例如:
import xml.etree.ElementTree as ET
file_name = "test/updateinfo.xml"
tree = ET.parse(file_name)
cveList = []
for reference in tree.findall('update/references/reference'):
if reference.attrib.get('type') == 'cve':
cveList.append(reference.attrib.get('id'))
print(cveList)
Output:
['CVE-2017-7793', 'CVE-2017-7810', 'CVE-2017-7814', 'CVE-2017-7818', 'CVE-2017-7819', 'CVE-2017-7823', 'CVE-2017-7824']
Comment: cveList for each update item instead of getting all items in one list.I would like to iterate in each update and get other attributes as well
# Findall 'update' Elements in tree
for update in tree.findall('update'):
# Findall 'references/reference' in update
for reference in update.findall('references/reference'):
if reference.attrib.get('type') == 'cve':
# Find Element with tag <title> in update
title = update.find('title').text
# Append a Dict with keys 'title' and 'id'
cveList.append({'title': title, 'id': reference.get('id')})
Output:
[{'id': 'CVE-2017-7793', 'title': 'Critical: firefox security update'}, {'id': 'CVE-2017-7810', 'title': 'Critical: firefox security update'}, {'id': 'CVE-2017-7814', 'title': 'Critical: firefox security update'}, {'id': 'CVE-2017-7818', 'title': 'Critical: firefox security update'}, {'id': 'CVE-2017-7819', 'title': 'Critical: firefox security update'}, {'id': 'CVE-2017-7823', 'title': 'Critical: firefox security update'}, {'id': 'CVE-2017-7824', 'title': 'Critical: firefox security update'}]
测试 Python:2.7.9
<?xml version="1.0" encoding="UTF-8"?>
<computer>
<extension_attributes>
<extension_attribute>
<id>8</id>
<name>user1</name>
<type>String</type>
<multi_value>false</multi_value>
<value>Installed</value>
</extension_attribute>
<extension_attribute>
<id>33</id>
<name>user2</name>
<type>String</type>
<multi_value>false</multi_value>
<value>Not Installed</value>
</extension_attribute>
</extension_attributes>
import requests
import xml.etree.cElementTree
get_url = "<https://some.url.com/extension_attributes>"
headers = {'Accept': 'application/xml', 'Content-Type':
'application/xml', 'authorization': 'Basic xxxxx'}
r = requests.get(get_url, headers=headers)
root = xml.etree.ElementTree.fromstring(r.text)
values = root.findall('extension_attributes/extension_attribute')
for val in values:
if val.find('id').text == '33':
print('Value', val.find('value').text)