使用 xpath 循环遍历特定 lxml 元素时获取完整的属性列表
Getting complete list of attributes while looping through particular lxml elements when using xpath
让我们考虑以下 xml:
from lxml import etree
xmldump = '''<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">ggggggg</title>
<author>g</author>
<year>2006</year>
<price>129.99</price>
</book>
<book category="CHILDREN">
<title lang="es">hhhhhhh</title>
<author>h</author>
<year>2007</year>
<price>229.99</price>
</book>
<book category="CHILDREN">
<title lang="cn">kkkkkkkk</title>
<author>k</author>
<year>2008</year>
<price>329.99</price>
</book>
<book category="CHILDREN">
<title lang="ru">llllllllll</title>
<author>l</author>
<year>2009</year>
<price>429.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>'''
现在我想从具有 category="CHILDREN"
属性的书籍节点中获取 lang
属性的值,所以我这样做了:
xmlproc = etree.fromstring(xmldump.encode('utf-8'))
books = xmlproc.xpath("//*[@category='CHILDREN']")
我得到了一个包含 4 个元素的 books
列表,所以我继续遍历它们,以便从每个元素中获取 lang
属性值
for b in books:
language = b.xpath("//title/@lang")
language2 = b.xpath("//*/@lang")
结果是:language = ['en', 'en', 'es', 'cn', 'ru', 'en'] when in事实上,我期待 language = ['en'] 用于第一个循环,然后是 ['es'] 等等 ['cn'],最后是 ['ru'] 用于最后一个循环图书清单。
现在 language & language2
得到了一个包含 所有 en 属性的列表,这些属性来自我最初的 xmldump。所以我只从 b
元素 中请求属性,为什么我得到了整个属性列表? - b
元素是图书列表中的每个元素。
此外,正确的方法是什么,以便我可以获得任何特定属性。请注意,我还需要为每个特定的 b
元素找到子孙元素,因此我需要能够分离并循环遍历那些特定的 lxml 元素,而不是在初始 xmlproc
中.
b.xpath("//title/@lang")
& b.xpath("//*/@lang")
双反斜杠将从 xml 中检索所有数据(而不是您的过滤结果)。只需删除它:
from lxml import etree
xmldump = '''<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">ggggggg</title>
<author>g</author>
<year>2006</year>
<price>129.99</price>
</book>
<book category="CHILDREN">
<title lang="es">hhhhhhh</title>
<author>h</author>
<year>2007</year>
<price>229.99</price>
</book>
<book category="CHILDREN">
<title lang="cn">kkkkkkkk</title>
<author>k</author>
<year>2008</year>
<price>329.99</price>
</book>
<book category="CHILDREN">
<title lang="ru">llllllllll</title>
<author>l</author>
<year>2009</year>
<price>429.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>'''
xmlproc = etree.fromstring(xmldump.encode('utf-8'))
books = xmlproc.xpath("// *[@category='CHILDREN']")
for b in books:
language = b.xpath("title/@lang")
language2 = b.xpath("*/@lang")
print(language)
print(language2)
输出:
['en']
['en']
['es']
['es']
['cn']
['cn']
['ru']
['ru']
您可以将逻辑从 for 循环移动到 xpath 中:
languageArr = xmlproc.xpath("// *[@category='CHILDREN'] //title/@lang")
print(languageArr)
language2Arr = xmlproc.xpath("// *[@category='CHILDREN'] //*/@lang")
print(language2Arr)
输出:
['en', 'es', 'cn', 'ru']
['en', 'es', 'cn', 'ru']
让我们考虑以下 xml:
from lxml import etree
xmldump = '''<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">ggggggg</title>
<author>g</author>
<year>2006</year>
<price>129.99</price>
</book>
<book category="CHILDREN">
<title lang="es">hhhhhhh</title>
<author>h</author>
<year>2007</year>
<price>229.99</price>
</book>
<book category="CHILDREN">
<title lang="cn">kkkkkkkk</title>
<author>k</author>
<year>2008</year>
<price>329.99</price>
</book>
<book category="CHILDREN">
<title lang="ru">llllllllll</title>
<author>l</author>
<year>2009</year>
<price>429.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>'''
现在我想从具有 category="CHILDREN"
属性的书籍节点中获取 lang
属性的值,所以我这样做了:
xmlproc = etree.fromstring(xmldump.encode('utf-8'))
books = xmlproc.xpath("//*[@category='CHILDREN']")
我得到了一个包含 4 个元素的 books
列表,所以我继续遍历它们,以便从每个元素中获取 lang
属性值
for b in books:
language = b.xpath("//title/@lang")
language2 = b.xpath("//*/@lang")
结果是:language = ['en', 'en', 'es', 'cn', 'ru', 'en'] when in事实上,我期待 language = ['en'] 用于第一个循环,然后是 ['es'] 等等 ['cn'],最后是 ['ru'] 用于最后一个循环图书清单。
现在 language & language2
得到了一个包含 所有 en 属性的列表,这些属性来自我最初的 xmldump。所以我只从 b
元素 中请求属性,为什么我得到了整个属性列表? - b
元素是图书列表中的每个元素。
此外,正确的方法是什么,以便我可以获得任何特定属性。请注意,我还需要为每个特定的 b
元素找到子孙元素,因此我需要能够分离并循环遍历那些特定的 lxml 元素,而不是在初始 xmlproc
中.
b.xpath("//title/@lang")
& b.xpath("//*/@lang")
双反斜杠将从 xml 中检索所有数据(而不是您的过滤结果)。只需删除它:
from lxml import etree
xmldump = '''<bookstore>
<book category="COOKING">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="CHILDREN">
<title lang="en">ggggggg</title>
<author>g</author>
<year>2006</year>
<price>129.99</price>
</book>
<book category="CHILDREN">
<title lang="es">hhhhhhh</title>
<author>h</author>
<year>2007</year>
<price>229.99</price>
</book>
<book category="CHILDREN">
<title lang="cn">kkkkkkkk</title>
<author>k</author>
<year>2008</year>
<price>329.99</price>
</book>
<book category="CHILDREN">
<title lang="ru">llllllllll</title>
<author>l</author>
<year>2009</year>
<price>429.99</price>
</book>
<book category="WEB">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>'''
xmlproc = etree.fromstring(xmldump.encode('utf-8'))
books = xmlproc.xpath("// *[@category='CHILDREN']")
for b in books:
language = b.xpath("title/@lang")
language2 = b.xpath("*/@lang")
print(language)
print(language2)
输出:
['en']
['en']
['es']
['es']
['cn']
['cn']
['ru']
['ru']
您可以将逻辑从 for 循环移动到 xpath 中:
languageArr = xmlproc.xpath("// *[@category='CHILDREN'] //title/@lang")
print(languageArr)
language2Arr = xmlproc.xpath("// *[@category='CHILDREN'] //*/@lang")
print(language2Arr)
输出:
['en', 'es', 'cn', 'ru']
['en', 'es', 'cn', 'ru']