使用 Python 验证 XML 节点结构

Validating XML node structure with Python

我有文件:

<?xml version='1.0' encoding='UTF-8'?>
<AUTOSAR xmlns="http://autosar.org/schema/r4.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://autosar.org/schema/r4.0 AUTOSAR_4-2-2_STRICT_COMPACT.xsd">
    <AR-PACKAGES>
        <AR-PACKAGE>
            <SHORT-NAME>RootP_Composition</SHORT-NAME>
            <COMPOSITION-SW-COMPONENT-TYPE>
                <SHORT-NAME>Compo_VSM</SHORT-NAME>
                <CONNECTORS>
                    <ASSEMBLY-SW-CONNECTOR>
                        <SHORT-NAME>PP_CS_VehicleSPeed_ASWC_M6_to_ASWC_M740</SHORT-NAME>
                        <PROVIDER-IREF>
                            <CONTEXT-COMPONENT-REF DEST="SW-COMPONENT-PROTOTYPE">/RootP_Composition/Compo_VSM/Instance_ASWC_M6</CONTEXT-COMPONENT-REF>
                            <TARGET-P-PORT-REF DEST="P-PORT-PROTOTYOPE">/RootP_ASWC_M6/ASWC_M6/PP_CS_VehicleSPeed</TARGET-P-PORT-REF>
                        </PROVIDER-IREF>
                        <REQUESTER-IREF>
                            <CONTEXT-COMPONENT-REF DEST="SW-COMPONENT-PROTOTYPE">/RootP_Composition/Compo_VSM/Instance_ASWC_M740</CONTEXT-COMPONENT-REF>
                            <TARGET-R-PORT-REF DEST="R-PORT-PROTOTYOPE">/RootP_ASWC_M740/ASWC_M740/RP_CS_VehicleSPeed</TARGET-R-PORT-REF>
                        </REQUESTER-IREF>
                    </ASSEMBLY-SW-CONNECTOR>
                </CONNECTORS>
            </COMPOSITION-SW-COMPONENT-TYPE>
        </AR-PACKAGE>
    </AR-PACKAGES>
</AUTOSAR>

我想检查 ASSEMBLY-SW-CONNECTOR 节点是否有子节点 SHORT-NAMEPROVIDER-IREFREQUESTER-IREF 以及 PROVIDER-IREF/REQUESTER-IREF 是否有子节点 ( ASSEMBLY-SW-CONNECTOR) CONTEXT-COMPONENT-REFTARGET-P-PORT-REF/CONTEXT-COMPONENT-REFTARGET-R-PORT-REF

的孙子

到目前为止我有这个代码:

tree = ET.parse('C:\test\Abu\TRS.ABU.GEN.002\output\Connectors.arxml')
root = tree.getroot()
child = ["SHORT-NAME", "PROVIDER-IREF", "REQUESTER-IREF"]
grandchild = ["CONTEXT-COMPONENT-REF", "TARGET-P-PORT-REF", "CONTEXT-COMPONENT-REF", "TARGET-R-PORT-REF"]
connector = '{http://autosar.org/schema/r4.0}ASSEMBLY-SW-CONNECTOR'
for element in root.iter(tag = connector):
    for child in element:
        for grandchild in child:
            if child.tag.split('}', 1)[1] in child:
                if grandchild.tag.split('}', 1)[1] in grandchild:
                    print("yes")
                else:
                    print("No")

我哪里错了?提前致谢!

更新 1

tree = etree.parse('C:\test\Abu\TRS.ABU.GEN.002\output\Connectors.arxml')
root = tree.getroot()
found_name = found_provider = found_requester = found_contextP = found_targetP = found_contextR =found_targetR = False
connectors =  root.findall(".//{http://autosar.org/schema/r4.0}ASSEMBLY-SW-CONNECTOR>")
for elem in connectors:
    if elem.find(".//{http://autosar.org/schema/r4.0}SHORT-NAME>"):
        found_name = True
    if elem.find(".//{http://autosar.org/schema/r4.0}PROVIDER-IREF>"):
        found_provider = True
        for child in elem.find(".//{http://autosar.org/schema/r4.0}PROVIDER-IREF>"):
            if child.find(".//{http://autosar.org/schema/r4.0}CONTEXT-COMPONENT-REF>"):
                found_contextR = True
            if child.find(".//{http://autosar.org/schema/r4.0}TARGET-P-PORT-REF>"):
                found_targetP = True
    if elem.find(".//{http://autosar.org/schema/r4.0}REQUESTER-IREF>"):
        found_requester = True
        for child in elem.find(".//{http://autosar.org/schema/r4.0}REQUESTER-IREF>"):
            if child.find(".//{http://autosar.org/schema/r4.0}CONTEXT-COMPONENT-REF>"):
                found_contextR = True
            if child.find(".//{http://autosar.org/schema/r4.0}TARGET-R-PORT-REF>"):
                found_targetR = True

if found_name and found_provider and found_requester and found_contextP and found_targetP and found_contextR and found_targetR:
    print("True")
else:
    print("False")

知道为什么我得到错误的结果吗?

两期:

首先,一些typos/small错误:

  • 您的所有查找路径中都有一个不必要的结束标记 (>),因此它们都需要删除
  • 在你的 found_provider 部分,当我认为你的意思是 found_contextP 时你设置了 found_contextRP,而不是 R)
  • 使用

    if elem.find("<path>"):
    

    引发警告,您应该改用

    if elem.find("<path>") is not None:
    

其次,您的 child 元素部分有误,例如 found_provider 部分:

if elem.find(".//{http://autosar.org/schema/r4.0}PROVIDER-IREF>"):
    found_provider = True
    for child in elem.find(".//{http://autosar.org/schema/r4.0}PROVIDER-IREF>"):
        if child.find(".//{http://autosar.org/schema/r4.0}CONTEXT-COMPONENT-REF>"):
            found_contextR = True
        if child.find(".//{http://autosar.org/schema/r4.0}TARGET-P-PORT-REF>"):
            found_targetP = True

您正确地找到了 PROVIDER-IREF 元素,然后您遍历它的 children 试图匹配 CONTEXT-COMPONENT-REFTARGET-P-PORT-REF 元素。但是你通过搜索它们作为这些 child 元素的 children 来做到这一点(即 PROVIDER-IREF 的 grandchildren),当它们本身 child人。

所以要么你需要检查 child 元素的标签,而不是搜索它们下面的元素:

if elem.find(".//{http://autosar.org/schema/r4.0}PROVIDER-IREF") is not None:
    found_provider = True
    for child in elem.find(".//{http://autosar.org/schema/r4.0}PROVIDER-IREF"):
        if child.tag == "{http://autosar.org/schema/r4.0}CONTEXT-COMPONENT-REF":
            found_contextP = True
        if child.tag == "{http://autosar.org/schema/r4.0}TARGET-P-PORT-REF":
            found_targetP = True

或者您可以尝试提取 PROVIDER-IREF 元素,然后在其下查找元素:

provider = elem.find(".//{http://autosar.org/schema/r4.0}PROVIDER-IREF")
if provider is not None:
    found_provider = True
    if provider.find("{http://autosar.org/schema/r4.0}CONTEXT-COMPONENT-REF") is not None:
        found_contextP = True
    if provider.find("{http://autosar.org/schema/r4.0}TARGET-P-PORT-REF") is not None:
        found_targetP = True

显然,然后对 found_requester 部分执行类似操作。


我觉得你最初的做法其实很好;尝试指定一个 child-grandchild 结构,然后检查它是否适合 XML。但是你需要指定哪个 grandchildren 属于哪个 children,所以可以像这样使用嵌套字典:

structure = {
    "ASSEMBLY-SW-CONNECTOR": {
        "SHORT-NAME": None,
        "PROVIDER-IREF": {
            "CONTEXT-COMPONENT-REF": None,
            "TARGET-P-PORT-REF": None
            }
        "REQUESTER-IREF": {
            "CONTEXT-COMPONENT-REF": None,
            "TARGET-R-PORT-REF": None
            }
        }
    }

然后有一个递归函数(即调用自身的函数)来搜索匹配的 children 直到它到达 None 并停止向下查找该分支。