lxml - 使用查找方法查找特定标签? (没有找到)
lxml - using find method to find specific tag? (does not find)
我有一个 xml 文件,我需要更新某些特定标签的某些值。在 header
标签中有一些带有命名空间的标签。对此类标签使用查找有效,但如果我尝试搜索其他一些没有名称空间的标签,则找不到。
我试过相对、绝对路径,都没有找到。代码是这样的:
from lxml import etree
tree = etree.parse('test.xml')
root = tree.getroot()
# get its namespace map, excluding default namespace
nsmap = {k:v for k,v in root.nsmap.iteritems() if k}
# Replace values in tags
identity = tree.find('.//env:identity', nsmap)
identity.text = 'Placeholder' # works fine
e01_0017 = tree.find('.//e01_0017') # does not find
e01_0017.text = 'Placeholder' # and then it throws this ofcourse: AttributeError: 'NoneType' object has no attribute 'text'
# Also tried like this, but still not working
e01_0017 = tree.find('Envelope/Body/IVOIC/UNB/cmp04/e01_0017')
我什至尝试查找 body
标签,但它也找不到。
这是 xml 结构的样子:
<?xml version="1.0" encoding="ISO-8859-1"?><Envelope xmlns="http://www.someurl.com/TTT" xmlns:env="http://www.someurl.com/TTT_Envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="http://www.someurl.com/TTT TTT_INVOIC.xsd"><Header>
<env:delivery>
<env:to>
<env:address>Test</env:address>
</env:to>
<env:from>
<env:address>Test2</env:address>
</env:from>
<env:reliability>
<env:sendReceiptTo/>
<env:receiptRequiredBy/>
</env:reliability>
</env:delivery>
<env:properties>
<env:identity>some code</env:identity>
<env:sentAt>2006-03-17T00:38:04+01:00</env:sentAt>
<env:expiresAt/>
<env:topic>http://www.someurl.com/TTT/</env:topic>
</env:properties>
<env:manifest>
<env:reference uri="#INVOIC@D00A">
<env:description>Doc Name Descr</env:description>
</env:reference>
</env:manifest>
<env:process>
<env:type></env:type>
<env:instance/>
<env:handle></env:handle>
</env:process>
</Header>
<Body>
<INVOIC>
<UNB>
<cmp01>
<e01_0001>1</e01_0001>
<e02_0002>1</e02_0002>
</cmp01>
<cmp02>
<e01_0004>from</e01_0004>
</cmp02>
<cmp03>
<e01_0010>to</e01_0010>
</cmp03>
<cmp04>
<e01_0017>060334</e01_0017>
<e02_0019>1652</e02_0019>
</cmp04>
<e01_0020>1</e01_0020>
<cmp05>
<e01_0022>1</e01_0022>
</cmp05>
</UNB>
</INVOIC>
</Body>
</Envelope>
更新 header 或信封标签似乎有问题。例如,如果我使用 xml 而没有 header 和信封信息,那么就可以很好地找到标签。如果我包含信封属性和 header,它将停止查找标签。使用 header 信息
更新了 xml 示例
问题是像 e01_0017
这样的元素也有一个命名空间,它从其父元素的命名空间继承其命名空间,在这种情况下它一直回到 - <Envelope>
。您的元素的名称空间是 - "http://www.someurl.com/TTT"
.
你有两个选择,
要么直接在XPATH中指定命名空间,例子-
e01_0017 = tree.find('.//{http://www.someurl.com/TTT}e01_0017')
演示(为您的xml)-
In [39]: e01_0017 = tree.find('.//{http://www.someurl.com/TTT}e01_0017')
In [40]: e01_0017
Out[40]: <Element {http://www.someurl.com/TTT}e01_0017 at 0x2fe78c8>
另一种选择是将它添加到 nsmap
中,并为密钥添加一些默认值,然后在 xpath 中使用它。示例 -
nsmap = {(k or 'def'):v for k,v in root.nsmap.items()}
e01_0017 = tree.find('.//def:e01_0017',nsmap)
我有一个 xml 文件,我需要更新某些特定标签的某些值。在 header
标签中有一些带有命名空间的标签。对此类标签使用查找有效,但如果我尝试搜索其他一些没有名称空间的标签,则找不到。
我试过相对、绝对路径,都没有找到。代码是这样的:
from lxml import etree
tree = etree.parse('test.xml')
root = tree.getroot()
# get its namespace map, excluding default namespace
nsmap = {k:v for k,v in root.nsmap.iteritems() if k}
# Replace values in tags
identity = tree.find('.//env:identity', nsmap)
identity.text = 'Placeholder' # works fine
e01_0017 = tree.find('.//e01_0017') # does not find
e01_0017.text = 'Placeholder' # and then it throws this ofcourse: AttributeError: 'NoneType' object has no attribute 'text'
# Also tried like this, but still not working
e01_0017 = tree.find('Envelope/Body/IVOIC/UNB/cmp04/e01_0017')
我什至尝试查找 body
标签,但它也找不到。
这是 xml 结构的样子:
<?xml version="1.0" encoding="ISO-8859-1"?><Envelope xmlns="http://www.someurl.com/TTT" xmlns:env="http://www.someurl.com/TTT_Envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="http://www.someurl.com/TTT TTT_INVOIC.xsd"><Header>
<env:delivery>
<env:to>
<env:address>Test</env:address>
</env:to>
<env:from>
<env:address>Test2</env:address>
</env:from>
<env:reliability>
<env:sendReceiptTo/>
<env:receiptRequiredBy/>
</env:reliability>
</env:delivery>
<env:properties>
<env:identity>some code</env:identity>
<env:sentAt>2006-03-17T00:38:04+01:00</env:sentAt>
<env:expiresAt/>
<env:topic>http://www.someurl.com/TTT/</env:topic>
</env:properties>
<env:manifest>
<env:reference uri="#INVOIC@D00A">
<env:description>Doc Name Descr</env:description>
</env:reference>
</env:manifest>
<env:process>
<env:type></env:type>
<env:instance/>
<env:handle></env:handle>
</env:process>
</Header>
<Body>
<INVOIC>
<UNB>
<cmp01>
<e01_0001>1</e01_0001>
<e02_0002>1</e02_0002>
</cmp01>
<cmp02>
<e01_0004>from</e01_0004>
</cmp02>
<cmp03>
<e01_0010>to</e01_0010>
</cmp03>
<cmp04>
<e01_0017>060334</e01_0017>
<e02_0019>1652</e02_0019>
</cmp04>
<e01_0020>1</e01_0020>
<cmp05>
<e01_0022>1</e01_0022>
</cmp05>
</UNB>
</INVOIC>
</Body>
</Envelope>
更新 header 或信封标签似乎有问题。例如,如果我使用 xml 而没有 header 和信封信息,那么就可以很好地找到标签。如果我包含信封属性和 header,它将停止查找标签。使用 header 信息
更新了 xml 示例问题是像 e01_0017
这样的元素也有一个命名空间,它从其父元素的命名空间继承其命名空间,在这种情况下它一直回到 - <Envelope>
。您的元素的名称空间是 - "http://www.someurl.com/TTT"
.
你有两个选择,
要么直接在XPATH中指定命名空间,例子-
e01_0017 = tree.find('.//{http://www.someurl.com/TTT}e01_0017')
演示(为您的xml)-
In [39]: e01_0017 = tree.find('.//{http://www.someurl.com/TTT}e01_0017') In [40]: e01_0017 Out[40]: <Element {http://www.someurl.com/TTT}e01_0017 at 0x2fe78c8>
另一种选择是将它添加到
nsmap
中,并为密钥添加一些默认值,然后在 xpath 中使用它。示例 -nsmap = {(k or 'def'):v for k,v in root.nsmap.items()} e01_0017 = tree.find('.//def:e01_0017',nsmap)