LXML Xpath 查询

LXML Xpath Query

我正在编写一个小而肮脏的模块,将 XML 文档转换为 JSON,以便各种 Javascript 库可以在 [=38] 中显示它=].这涉及我学习使用 LXML 及其各种 XPath 函数。

我有以下代码块:

    def parse(self):

        parser = etree.XMLParser(remove_comments=True, encoding="UTF-8", no_network=True, recover=True)
        root = etree.XML(self.text, parser=parser)
        self.tree = etree.XPathElementEvaluator(root)

        print(f"test: { self.tree('/*') }")

在我的单元测试中,输出如下:

test_parse (test_converter.TestConverter) ... test: [<Element {http://www.ivoa.net/xml/VOTable/v1.3}VOTABLE at 0x7fa2b99a8dc0>]

但是,当我尝试如下查询时,结果是一个空列表:

print(f"test: { self.tree('/VOTABLE*') }")

我试过将命名空间添加到 VOTABLE 前,如下所示,但也没有结果:

print(f"test: { self.tree('/{http://www.ivoa.net/xml/VOTable/v1.3}VOTABLE*') }")

谁能告诉我我犯了什么菜鸟错误?

示例数据:

<VOTABLE version="1.4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="http://www.ivoa.net/xml/VOTable/v1.3"
  xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.3 http://www.ivoa.net/xml/VOTable/v1.3">
 <DESCRIPTION>
   VizieR Astronomical Server vizier.u-strasbg.fr
    Date: 2020-11-07T11:43:26 [V1.99+ (14-Oct-2013)]
   Explanations and Statistics of UCDs:         See LINK below
   In case of problem, please report to:    cds-question@unistra.fr
   In this version, NULL integer columns are written as an empty string
   &lt;TD&gt;&lt;/TD&gt;, explicitely possible from VOTable-1.3
 </DESCRIPTION>
 <RESOURCE ID="yCat_3135" name="III/135A">
  ...
 </RESOURCE>
 ...
</VOTABLE>

更新:解决方案

一旦 drec4s 指出我没有为查询注册命名空间,我就设法找出我做错了什么。这是代码的工作块:

        parser = etree.XMLParser(remove_comments=True, encoding="UTF-8", no_network=True, recover=True)
        root = etree.XML(self.text, parser=parser)
        self.tree = etree.XPathElementEvaluator(root)
        self.tree.register_namespace("n", "http://www.ivoa.net/xml/VOTable/v1.3")
        test = self.tree("/n:VOTABLE/n:DESCRIPTION/text()")

您可以使用 xpath 方法,但您还需要包含一个 namespace 映射到该方法:

from lxml import etree
from io import StringIO

xmldoc =  StringIO("""
<VOTABLE version="1.4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns="http://www.ivoa.net/xml/VOTable/v1.3"
  xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.3 http://www.ivoa.net/xml/VOTable/v1.3">
 <DESCRIPTION>
   VizieR Astronomical Server vizier.u-strasbg.fr
    Date: 2020-11-07T11:43:26 [V1.99+ (14-Oct-2013)]
   Explanations and Statistics of UCDs:         See LINK below
   In case of problem, please report to:    cds-question@unistra.fr
   In this version, NULL integer columns are written as an empty string
   &lt;TD&gt;&lt;/TD&gt;, explicitely possible from VOTable-1.3
 </DESCRIPTION>
 <RESOURCE ID="yCat_3135" name="III/135A">
 </RESOURCE>
</VOTABLE>
""")

tree = etree.parse(xmldoc)
root = tree.getroot()
print(root.xpath('//n:DESCRIPTION', namespaces={'n': 'http://www.ivoa.net/xml/VOTable/v1.3'})[0].text)

输出:

   VizieR Astronomical Server vizier.u-strasbg.fr
    Date: 2020-11-07T11:43:26 [V1.99+ (14-Oct-2013)]
   Explanations and Statistics of UCDs:         See LINK below
   In case of problem, please report to:    cds-question@unistra.fr
   In this version, NULL integer columns are written as an empty string
   <TD></TD>, explicitely possible from VOTable-1.3