LXML Xpath 查询
LXML Xpath Query
我正在编写一个小而肮脏的模块,将 XML 文档转换为 JSON,以便各种 Javascript 库可以在 [=38] 中显示它=].这涉及我学习使用 LXML 及其各种 XPath 函数。
我有以下代码块:
def parse(self):
parser = etree.XMLParser(remove_comments=True, encoding="UTF-8", no_network=True, recover=True)
root = etree.XML(self.text, parser=parser)
self.tree = etree.XPathElementEvaluator(root)
print(f"test: { self.tree('/*') }")
在我的单元测试中,输出如下:
test_parse (test_converter.TestConverter) ... test: [<Element {http://www.ivoa.net/xml/VOTable/v1.3}VOTABLE at 0x7fa2b99a8dc0>]
但是,当我尝试如下查询时,结果是一个空列表:
print(f"test: { self.tree('/VOTABLE*') }")
我试过将命名空间添加到 VOTABLE 前,如下所示,但也没有结果:
print(f"test: { self.tree('/{http://www.ivoa.net/xml/VOTable/v1.3}VOTABLE*') }")
谁能告诉我我犯了什么菜鸟错误?
示例数据:
<VOTABLE version="1.4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.ivoa.net/xml/VOTable/v1.3"
xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.3 http://www.ivoa.net/xml/VOTable/v1.3">
<DESCRIPTION>
VizieR Astronomical Server vizier.u-strasbg.fr
Date: 2020-11-07T11:43:26 [V1.99+ (14-Oct-2013)]
Explanations and Statistics of UCDs: See LINK below
In case of problem, please report to: cds-question@unistra.fr
In this version, NULL integer columns are written as an empty string
<TD></TD>, explicitely possible from VOTable-1.3
</DESCRIPTION>
<RESOURCE ID="yCat_3135" name="III/135A">
...
</RESOURCE>
...
</VOTABLE>
更新:解决方案
一旦 drec4s 指出我没有为查询注册命名空间,我就设法找出我做错了什么。这是代码的工作块:
parser = etree.XMLParser(remove_comments=True, encoding="UTF-8", no_network=True, recover=True)
root = etree.XML(self.text, parser=parser)
self.tree = etree.XPathElementEvaluator(root)
self.tree.register_namespace("n", "http://www.ivoa.net/xml/VOTable/v1.3")
test = self.tree("/n:VOTABLE/n:DESCRIPTION/text()")
您可以使用 xpath
方法,但您还需要包含一个 namespace
映射到该方法:
from lxml import etree
from io import StringIO
xmldoc = StringIO("""
<VOTABLE version="1.4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.ivoa.net/xml/VOTable/v1.3"
xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.3 http://www.ivoa.net/xml/VOTable/v1.3">
<DESCRIPTION>
VizieR Astronomical Server vizier.u-strasbg.fr
Date: 2020-11-07T11:43:26 [V1.99+ (14-Oct-2013)]
Explanations and Statistics of UCDs: See LINK below
In case of problem, please report to: cds-question@unistra.fr
In this version, NULL integer columns are written as an empty string
<TD></TD>, explicitely possible from VOTable-1.3
</DESCRIPTION>
<RESOURCE ID="yCat_3135" name="III/135A">
</RESOURCE>
</VOTABLE>
""")
tree = etree.parse(xmldoc)
root = tree.getroot()
print(root.xpath('//n:DESCRIPTION', namespaces={'n': 'http://www.ivoa.net/xml/VOTable/v1.3'})[0].text)
输出:
VizieR Astronomical Server vizier.u-strasbg.fr
Date: 2020-11-07T11:43:26 [V1.99+ (14-Oct-2013)]
Explanations and Statistics of UCDs: See LINK below
In case of problem, please report to: cds-question@unistra.fr
In this version, NULL integer columns are written as an empty string
<TD></TD>, explicitely possible from VOTable-1.3
我正在编写一个小而肮脏的模块,将 XML 文档转换为 JSON,以便各种 Javascript 库可以在 [=38] 中显示它=].这涉及我学习使用 LXML 及其各种 XPath 函数。
我有以下代码块:
def parse(self):
parser = etree.XMLParser(remove_comments=True, encoding="UTF-8", no_network=True, recover=True)
root = etree.XML(self.text, parser=parser)
self.tree = etree.XPathElementEvaluator(root)
print(f"test: { self.tree('/*') }")
在我的单元测试中,输出如下:
test_parse (test_converter.TestConverter) ... test: [<Element {http://www.ivoa.net/xml/VOTable/v1.3}VOTABLE at 0x7fa2b99a8dc0>]
但是,当我尝试如下查询时,结果是一个空列表:
print(f"test: { self.tree('/VOTABLE*') }")
我试过将命名空间添加到 VOTABLE 前,如下所示,但也没有结果:
print(f"test: { self.tree('/{http://www.ivoa.net/xml/VOTable/v1.3}VOTABLE*') }")
谁能告诉我我犯了什么菜鸟错误?
示例数据:
<VOTABLE version="1.4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.ivoa.net/xml/VOTable/v1.3"
xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.3 http://www.ivoa.net/xml/VOTable/v1.3">
<DESCRIPTION>
VizieR Astronomical Server vizier.u-strasbg.fr
Date: 2020-11-07T11:43:26 [V1.99+ (14-Oct-2013)]
Explanations and Statistics of UCDs: See LINK below
In case of problem, please report to: cds-question@unistra.fr
In this version, NULL integer columns are written as an empty string
<TD></TD>, explicitely possible from VOTable-1.3
</DESCRIPTION>
<RESOURCE ID="yCat_3135" name="III/135A">
...
</RESOURCE>
...
</VOTABLE>
更新:解决方案
一旦 drec4s 指出我没有为查询注册命名空间,我就设法找出我做错了什么。这是代码的工作块:
parser = etree.XMLParser(remove_comments=True, encoding="UTF-8", no_network=True, recover=True)
root = etree.XML(self.text, parser=parser)
self.tree = etree.XPathElementEvaluator(root)
self.tree.register_namespace("n", "http://www.ivoa.net/xml/VOTable/v1.3")
test = self.tree("/n:VOTABLE/n:DESCRIPTION/text()")
您可以使用 xpath
方法,但您还需要包含一个 namespace
映射到该方法:
from lxml import etree
from io import StringIO
xmldoc = StringIO("""
<VOTABLE version="1.4" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.ivoa.net/xml/VOTable/v1.3"
xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.3 http://www.ivoa.net/xml/VOTable/v1.3">
<DESCRIPTION>
VizieR Astronomical Server vizier.u-strasbg.fr
Date: 2020-11-07T11:43:26 [V1.99+ (14-Oct-2013)]
Explanations and Statistics of UCDs: See LINK below
In case of problem, please report to: cds-question@unistra.fr
In this version, NULL integer columns are written as an empty string
<TD></TD>, explicitely possible from VOTable-1.3
</DESCRIPTION>
<RESOURCE ID="yCat_3135" name="III/135A">
</RESOURCE>
</VOTABLE>
""")
tree = etree.parse(xmldoc)
root = tree.getroot()
print(root.xpath('//n:DESCRIPTION', namespaces={'n': 'http://www.ivoa.net/xml/VOTable/v1.3'})[0].text)
输出:
VizieR Astronomical Server vizier.u-strasbg.fr
Date: 2020-11-07T11:43:26 [V1.99+ (14-Oct-2013)]
Explanations and Statistics of UCDs: See LINK below
In case of problem, please report to: cds-question@unistra.fr
In this version, NULL integer columns are written as an empty string
<TD></TD>, explicitely possible from VOTable-1.3