如何使用 lxml 从 XML 中检索 xsi:noNamespaceSchemaLocation？

Question

我正在尝试根据 xsi:noNamespaceSchemaLocation.

验证 XML

我研究了这个问题，但似乎没有任何可用的解决方案。

我的 XML 文件看起来是这样的：

<shiporder orderid="889923"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="shiporder.xsd">
  <orderperson>John Smith</orderperson>
  <shipto>
    <name>Ola Nordmann</name>
    <address>Langgt 23</address>
    <city>4000 Stavanger</city>
    <country>Norway</country>
  </shipto>
  <item>
    <title>Empire Burlesque</title>
    <note>Special Edition</note>
    <quantity>1</quantity>
    <price>10.90</price>
  </item>
  <item>
    <title>Hide your heart</title>
    <quantity>1</quantity>
    <price>9.90</price>
  </item>
</shiporder>

我从w3school

那里拿来的

这是我从 root 解析和获取 attrib 时得到的 {'{http://www.w3.org/2001/XMLSchema-instance}noNamespaceSchemaLocation': 'shiporder.xsd'}

如何在 Python 中使用 lxml？我查看了其他解析器，但到目前为止不知道该怎么做。

Answer 1

感谢@mzjn 指出 Clark 符号。

我想到的解决办法是：

from lxml import etree

...

it = etree.fromstring(xml)
# We need to go through all keys since they can be in
# Clark notation and have URL with brackets as a prefix
for attr in it.attrib:
    if 'noNamespaceSchemaLocation' in attr:
        xsd = it.attrib.get(attr)
        break

...

# Do validations based on XSD URL value

如何使用 lxml 从 XML 中检索 xsi:noNamespaceSchemaLocation？

How to retrieve xsi:noNamespaceSchemaLocation from XML with lxml?

python

lxml

xml-parsing