从 XML 标签获取 URL
Get the URL from XML tag
我的 XML 文件:
<xml
xmlns="http://www.myweb.org/2003/instance"
xmlns:link="http://www.myweb.org/2003/linkbase"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:iso4217="http://www.myweb.org/2003/iso4217"
xmlns:utr="http://www.myweb.org/2009/utr">
<link:schemaRef xlink:type="simple" xlink:href="http://www.myweb.com/form/2020-01-01/test.xsd"></link:schemaRef>
我想从 <link:schemaRef>
标签中获取 URL: http://www.myweb.com/folder/form/1/2020-01-01/test.xsd
。
我下面的 python 代码找到了 <link:schemaRef>
标签。但是我无法检索 URL.
from lxml import etree
with open(filepath,'rb') as f:
file = f.read()
root = etree.XML(file)
print(root.nsmap["link"]) #http://www.myweb.org/2003/linkbase
print(root.find(".//{"+root.nsmap["link"]+"}"+"schemaRef"))
使用:
>>> child = root.getchildren()[0]
>>> child.attrib
{'{http://www.w3.org/1999/xlink}type': 'simple', '{http://www.w3.org/1999/xlink}href': 'http://www.myweb.com/form/2020-01-01/test.xsd'}
>>> url = child.attrib['{http://www.w3.org/1999/xlink}href']
但是,我认为挑战在于您是否知道要使用的正确密钥(即 {http://www.w3.org/1999/xlink}href
)。如果这是问题所在,那么我们只需要:
>>> print(root.nsmap['xlink']) # Notice that the requested url is a href to the xlink
'http://www.w3.org/1999/xlink'
>>> key_url = "{"+key_prefix+"}href"
>>> print(child.attrib[key_url])
'http://www.myweb.com/form/2020-01-01/test.xsd'
这样试一下,看看行不行:
for i in root.xpath('//*/node()'):
if isinstance(i,lxml.etree._Element):
print(i.values()[1])
输出:
http://www.myweb.com/form/2020-01-01/test.xsd
我的 XML 文件:
<xml
xmlns="http://www.myweb.org/2003/instance"
xmlns:link="http://www.myweb.org/2003/linkbase"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:iso4217="http://www.myweb.org/2003/iso4217"
xmlns:utr="http://www.myweb.org/2009/utr">
<link:schemaRef xlink:type="simple" xlink:href="http://www.myweb.com/form/2020-01-01/test.xsd"></link:schemaRef>
我想从 <link:schemaRef>
标签中获取 URL: http://www.myweb.com/folder/form/1/2020-01-01/test.xsd
。
我下面的 python 代码找到了 <link:schemaRef>
标签。但是我无法检索 URL.
from lxml import etree
with open(filepath,'rb') as f:
file = f.read()
root = etree.XML(file)
print(root.nsmap["link"]) #http://www.myweb.org/2003/linkbase
print(root.find(".//{"+root.nsmap["link"]+"}"+"schemaRef"))
使用:
>>> child = root.getchildren()[0]
>>> child.attrib
{'{http://www.w3.org/1999/xlink}type': 'simple', '{http://www.w3.org/1999/xlink}href': 'http://www.myweb.com/form/2020-01-01/test.xsd'}
>>> url = child.attrib['{http://www.w3.org/1999/xlink}href']
但是,我认为挑战在于您是否知道要使用的正确密钥(即 {http://www.w3.org/1999/xlink}href
)。如果这是问题所在,那么我们只需要:
>>> print(root.nsmap['xlink']) # Notice that the requested url is a href to the xlink
'http://www.w3.org/1999/xlink'
>>> key_url = "{"+key_prefix+"}href"
>>> print(child.attrib[key_url])
'http://www.myweb.com/form/2020-01-01/test.xsd'
这样试一下,看看行不行:
for i in root.xpath('//*/node()'):
if isinstance(i,lxml.etree._Element):
print(i.values()[1])
输出:
http://www.myweb.com/form/2020-01-01/test.xsd