如何使用python解析同名元素xml的数据

How to parse the data of xml of elements with same names using python

您好,我有这个 xml,我想获取与每个 Branch 元素的名称和类型以及 FullProductName 字符串元素 ("Cisco Unified Computing System (Management Software) 3.0(1)c") 相关的字符串。我试图用 python 使用 bs4 来做到这一点。但我不知道我该怎么做。

XML:

<?xml version="1.0" encoding="UTF-8"?>
<cvrfdoc xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.icasi.org/CVRF/schema/cvrf/1.1">
  <DocumentTitle>Cisco Integrated Management Controller Remote Code Execution Vulnerability</DocumentTitle>
  <DocumentType>Cisco Security Advisory</DocumentType>
  <DocumentPublisher Type="Vendor">
    <ContactDetails>Emergency Support:
+1 877 228 7302 (toll-free within North America)
+1 408 525 6532 (International direct-dial)
Non-emergency Support:
Email: psirt@cisco.com
Support requests that are received via e-mail are typically acknowledged within 48 hours.</ContactDetails>
    <IssuingAuthority>Cisco product security incident response is the responsibility of the Cisco Product Security Incident Response Team (PSIRT). The Cisco PSIRT is a dedicated, global team that manages the receipt, investigation, and public reporting of security vulnerability information that is related to Cisco products and networks. The on-call Cisco PSIRT works 24x7 with Cisco customers, independent security researchers, consultants, industry organizations, and other vendors to identify possible security issues with Cisco products and networks.
More information can be found in Cisco Security Vulnerability Policy available at http://www.cisco.com/web/about/security/psirt/security_vulnerability_policy.html</IssuingAuthority>
  </DocumentPublisher>
  <DocumentTracking>
    <Identification>
      <ID>cisco-sa-20170419-cimc3</ID>
    </Identification>
    <Status>Final</Status>
    <Version>1.2</Version>
    <RevisionHistory>
      <Revision>
        <Number>1.0</Number>
        <Date>2017-04-18T16:50:37</Date>
        <Description>Initial public release.</Description>
      </Revision>
      <Revision>
        <Number>1.1</Number>
        <Date>2017-05-22T17:55:14</Date>
        <Description>Updated affected products.</Description>
      </Revision>
      <Revision>
        <Number>1.2</Number>
        <Date>2017-05-31T20:33:19</Date>
        <Description>Added vulnerable releases.</Description>
      </Revision>
    </RevisionHistory>
    <InitialReleaseDate>2017-04-19T16:00:00</InitialReleaseDate>
    <CurrentReleaseDate>2017-05-31T20:33:19</CurrentReleaseDate>
    <Generator>
      <Engine>TVCE</Engine>
    </Generator>
  </DocumentTracking>
  <DocumentNotes>
    <Note Title="Summary" Type="General" Ordinal="1">A vulnerability in the web-based GUI of Cisco Integrated Management Controller (IMC) could allow an unauthenticated, remote attacker to perform unauthorized remote command execution on the affected device.

The vulnerability exists because the affected software does not sufficiently sanitize specific values that are received as part of a user-supplied HTTP request. An attacker could exploit this vulnerability by sending a crafted HTTP request to the affected software. Successful exploitation could allow an unauthenticated attacker to execute system commands with root-level privileges.

There are no workarounds that address this vulnerability.

This advisory is available at the following link:
https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20170419-cimc3 ["https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20170419-cimc3"]</Note>
    <Note Title="CVSS 3.0 Notice" Type="Other" Ordinal="2">Although CVRF version 1.1 does not support CVSS version 3, the CVSS score in this CVRF file is a CVSSv3 base and temporal score, as Cisco is now scoring vulnerabilities in CVSSv3.</Note>
  </DocumentNotes>
  <DocumentReferences>
    <Reference Type="Self">
      <URL>https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20170419-cimc3</URL>
      <Description>Cisco Integrated Management Controller Remote Code Execution Vulnerability</Description>
    </Reference>
  </DocumentReferences>
  <ProductTree xmlns="http://www.icasi.org/CVRF/schema/prod/1.1">
    <Branch Name="Cisco" Type="Vendor">
      <Branch Name="Cisco Unified Computing System (Management Software)" Type="Product Name">
        <Branch Name="3.0" Type="Product Version">
          <Branch Name="(1)c" Type="Service Pack">
            <FullProductName ProductID="CVRFPID-203522">Cisco Unified Computing System (Management Software) 3.0(1)c</FullProductName>
          </Branch>
        </Branch>
      </Branch>
    </Branch>
  </ProductTree>
  <Vulnerability Ordinal="1" xmlns="http://www.icasi.org/CVRF/schema/vuln/1.1">
    <Title>Cisco Integrated Management Controller Remote Code Execution Vulnerability</Title>
    <ID SystemName="Cisco Bug ID">CSCvd14578</ID>
    <Notes>
      <Note Title="Summary" Type="Summary" Ordinal="1">A vulnerability in the web-based GUI of Cisco Integrated Management Controller (IMC) could allow an unauthenticated, remote attacker to perform unauthorized remote command execution on the affected device.



The vulnerability exists because the affected software does not sufficiently sanitize specific values that are received as part of a user-supplied HTTP request. An attacker could exploit this vulnerability by sending a crafted HTTP request to the affected software. Successful exploitation could allow an unauthenticated attacker to execute system commands with root-level privileges.</Note>
      <Note Title="Cisco Bug IDs" Type="Other" Ordinal="3">CSCvd14578</Note>
    </Notes>
    <CVE>CVE-2017-6616</CVE>
    <ProductStatuses>
      <Status Type="Known Affected">
        <ProductID>CVRFPID-203522</ProductID>
      </Status>
    </ProductStatuses>
    <CVSSScoreSets>
      <ScoreSet>
        <BaseScore>9.8</BaseScore>
        <Vector>CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H</Vector>
      </ScoreSet>
    </CVSSScoreSets>
    <Remediations>
      <Remediation Type="Workaround">
        <Description>There are no workarounds that address this vulnerability.</Description>
      </Remediation>
    </Remediations>
    <References>
      <Reference Type="Self">
        <URL>https://tools.cisco.com/security/center/content/CiscoSecurityAdvisory/cisco-sa-20170419-cimc3</URL>
        <Description>Cisco Integrated Management Controller Remote Code Execution Vulnerability</Description>
      </Reference>
    </References>
  </Vulnerability>
</cvrfdoc>

Python:

from bs4 import BeautifulSoup

xmlData = open("test.xml")
soup = BeautifulSoup(xmlData, "lxml")
preoductTree = soup.producttree

vendor = preoductTree.find_all("branch", attrs={"type": "Vendor"})

有什么想法吗?

提前致谢。

所以你必须先找到标签,这样我们才能遍历文件。我将使用固有的 python xml 包。

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()

然后让我们找到第一个子标签,这样我们就可以 运行 遍历文件。

for child in root:
    print child.tag, child.attrib

>>{http://www.icasi.org/CVRF/schema/prod/1.1}Branch, {'Type': 'Vendor', 'Name': 'Cisco'}

如您所见,它不仅仅是您指定的 branch

现在我们可以运行递归遍历整个文件并获取所有这些元素。由于您没有指定所需的输出结构,我将为您将其放入字典中。

val_dict = dict()
for schild in root.iter('{http://www.icasi.org/CVRF/schema/prod/1.1}Branch'):
        val = schild.attrib
        val_dict[val.get('Type')] = val.get('Name')

>>{'Product Name': 'Cisco Unified Computing System (Management Software)',
 'Product Version': '3.0',
 'Service Pack': '(1)c',
 'Vendor': 'Cisco'}

正如我所指出的,只是一个在 Cisco CVRF 上使用 XSL 的片段:

                      <xsl:choose>
                        <xsl:when test="vuln:Remediations">
                            <xsl:for-each select="vuln:Remediations/vuln:Remediation">
                                <xsl:if test="vuln:ProductID">
                                    <xsl:for-each select="vuln:ProductID">
                                        <xsl:variable name="currPID" select="."/>
                                        <xsl:value-of
                                            select="//prod:FullProductName[@ProductID=$currPID]/."
                                        />
                                    </xsl:for-each>
                                </xsl:if>
                                <xsl:if test="vuln:GroupID">
                                    <xsl:for-each select="vuln:GroupID">
                                        <xsl:variable name="currGID" select="."/>
                                        <xsl:for-each
                                            select="//prod:Groups[@GroupID=$currGID]/ProductID">
                                            <xsl:value-of select="."/>
                                        </xsl:for-each>
                                    </xsl:for-each>
                                </xsl:if>
                            </xsl:for-each>
                        </xsl:when>
                    </xsl:choose>