在 lxml 中使用 find 函数查找元素
Find element with function find in lxml
我有下一个xml:
<?xml version='1.0' encoding='utf-8'?>
<SOAP:Envelope xmlns:SOAP="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://www.w3.org/2005/08/addressing">
<SOAP:Header>
</SOAP:Header>
<SOAP:Body>
<Server_Reply xmlns="some_url">
<conversionRate>
<conversionRateDetail>
<currency>dollar</currency>
</conversionRateDetail>
</conversionRate>
</Server_Reply>
</SOAP:Body>
</SOAP:Envelope>
它在reply.txt
。然后我做:
with open('reply.txt', 'r') as f:
reply = f.read()
reply_element = fromstring(reply)
我需要找到 Server_Reply
元素。
当我这样做时:
response = reply_element.find('Body/Server_Reply')
但它 returns None。
怎么做才正确?最后,我需要获取 Server_Reply 个元素。
您需要使用 .//
来表示您要查找 Body
,它是当前元素(即 SOAP:Envelope
)的后代(不是直接子代)。
而且,由于您的 xml 使用名称空间,您必须在 xpath 中包含名称空间(您给 .find()
的名称空间。示例 -
response = reply_xml.find('.//{http://www.w3.org/2003/05/soap-envelope}Body/{some_url}Server_Reply')
或
response = reply_xml.find('.//SOAP:Body/dummy:Server_Reply',namespaces = {'SOAP':'http://www.w3.org/2003/05/soap-envelope', 'dummy':'some_url'})
演示 -
In [55]: s = """<SOAP:Envelope xmlns:SOAP="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://www.w3.org/2005/08/addressing">
....: <SOAP:Header>
....: </SOAP:Header>
....: <SOAP:Body>
....: <Server_Reply xmlns="some_url">
....: <conversionRate>
....: <conversionRateDetail>
....: <currency>dollar</currency>
....: </conversionRateDetail>
....: </conversionRate>
....: </Server_Reply>
....: </SOAP:Body>
....: </SOAP:Envelope>"""
In [56]: reply_xml = etree.fromstring(s)
In [57]: reply_xml.find('.//SOAP:Body/dummy:Server_Reply',namespaces = {'SOAP':'http://www.w3.org/2003/05/soap-envelope', 'dummy':'some_url'})
Out[57]: <Element {some_url}Server_Reply at 0x481d708>
In [58]: reply_xml.find('.//{http://www.w3.org/2003/05/soap-envelope}Body/{some_url}Server_Reply')
Out[58]: <Element {some_url}Server_Reply at 0x481d708>
我发现 xpath 更加直观和简单:
from lxml import etree
xml = """<?xml version='1.0' encoding='utf-8'?>
<SOAP:Envelope xmlns:SOAP="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://www.w3.org/2005/08/addressing">
<SOAP:Header>
</SOAP:Header>
<SOAP:Body>
<Server_Reply xmlns="some_url">
<conversionRate>
<conversionRateDetail>
<currency>dollar</currency>
</conversionRateDetail>
</conversionRate>
</Server_Reply>
</SOAP:Body>
</SOAP:Envelope>"""
et = etree.fromstring(xml)
server_reply = et.xpath('//*[local-name()="Server_Reply"]')
使用 xml.etree 执行此操作。
#!/usr/bin/env python
import sys
from xml.etree import ElementTree
from lxml import etree
def run(fileName):
parser = etree.XMLParser(ns_clean=True)
data = ElementTree.parse(fileName, parser).getroot()
namespaces = data.nsmap
namespaces['some_url'] = 'some_url'
# Creating without duplicates here, which contains the unique list of elements determined by values of subelements
for row in data.findall('.//SOAP:Body/some_url:Server_Reply', namespaces = namespaces):
print row
if __name__ == "__main__":
run(sys.argv[1])
然后 运行 python 以 XML 文件作为参数:
python findElement.py sampleFile.xml
我有下一个xml:
<?xml version='1.0' encoding='utf-8'?>
<SOAP:Envelope xmlns:SOAP="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://www.w3.org/2005/08/addressing">
<SOAP:Header>
</SOAP:Header>
<SOAP:Body>
<Server_Reply xmlns="some_url">
<conversionRate>
<conversionRateDetail>
<currency>dollar</currency>
</conversionRateDetail>
</conversionRate>
</Server_Reply>
</SOAP:Body>
</SOAP:Envelope>
它在reply.txt
。然后我做:
with open('reply.txt', 'r') as f:
reply = f.read()
reply_element = fromstring(reply)
我需要找到 Server_Reply
元素。
当我这样做时:
response = reply_element.find('Body/Server_Reply')
但它 returns None。 怎么做才正确?最后,我需要获取 Server_Reply 个元素。
您需要使用 .//
来表示您要查找 Body
,它是当前元素(即 SOAP:Envelope
)的后代(不是直接子代)。
而且,由于您的 xml 使用名称空间,您必须在 xpath 中包含名称空间(您给 .find()
的名称空间。示例 -
response = reply_xml.find('.//{http://www.w3.org/2003/05/soap-envelope}Body/{some_url}Server_Reply')
或
response = reply_xml.find('.//SOAP:Body/dummy:Server_Reply',namespaces = {'SOAP':'http://www.w3.org/2003/05/soap-envelope', 'dummy':'some_url'})
演示 -
In [55]: s = """<SOAP:Envelope xmlns:SOAP="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://www.w3.org/2005/08/addressing">
....: <SOAP:Header>
....: </SOAP:Header>
....: <SOAP:Body>
....: <Server_Reply xmlns="some_url">
....: <conversionRate>
....: <conversionRateDetail>
....: <currency>dollar</currency>
....: </conversionRateDetail>
....: </conversionRate>
....: </Server_Reply>
....: </SOAP:Body>
....: </SOAP:Envelope>"""
In [56]: reply_xml = etree.fromstring(s)
In [57]: reply_xml.find('.//SOAP:Body/dummy:Server_Reply',namespaces = {'SOAP':'http://www.w3.org/2003/05/soap-envelope', 'dummy':'some_url'})
Out[57]: <Element {some_url}Server_Reply at 0x481d708>
In [58]: reply_xml.find('.//{http://www.w3.org/2003/05/soap-envelope}Body/{some_url}Server_Reply')
Out[58]: <Element {some_url}Server_Reply at 0x481d708>
我发现 xpath 更加直观和简单:
from lxml import etree
xml = """<?xml version='1.0' encoding='utf-8'?>
<SOAP:Envelope xmlns:SOAP="http://www.w3.org/2003/05/soap-envelope" xmlns:wsa="http://www.w3.org/2005/08/addressing">
<SOAP:Header>
</SOAP:Header>
<SOAP:Body>
<Server_Reply xmlns="some_url">
<conversionRate>
<conversionRateDetail>
<currency>dollar</currency>
</conversionRateDetail>
</conversionRate>
</Server_Reply>
</SOAP:Body>
</SOAP:Envelope>"""
et = etree.fromstring(xml)
server_reply = et.xpath('//*[local-name()="Server_Reply"]')
使用 xml.etree 执行此操作。
#!/usr/bin/env python
import sys
from xml.etree import ElementTree
from lxml import etree
def run(fileName):
parser = etree.XMLParser(ns_clean=True)
data = ElementTree.parse(fileName, parser).getroot()
namespaces = data.nsmap
namespaces['some_url'] = 'some_url'
# Creating without duplicates here, which contains the unique list of elements determined by values of subelements
for row in data.findall('.//SOAP:Body/some_url:Server_Reply', namespaces = namespaces):
print row
if __name__ == "__main__":
run(sys.argv[1])
然后 运行 python 以 XML 文件作为参数:
python findElement.py sampleFile.xml