使用 xpath 循环遍历特定 lxml 元素时获取完整的属性列表

Getting complete list of attributes while looping through particular lxml elements when using xpath

让我们考虑以下 xml:

from lxml import etree

xmldump = '''<bookstore>  
  <book category="COOKING">  
    <title lang="en">Everyday Italian</title>  
    <author>Giada De Laurentiis</author>  
    <year>2005</year>  
    <price>30.00</price>  
  </book>  
  <book category="CHILDREN">  
    <title lang="en">ggggggg</title>  
    <author>g</author>  
    <year>2006</year>  
    <price>129.99</price>  
  </book>
    <book category="CHILDREN">  
    <title lang="es">hhhhhhh</title>  
    <author>h</author>  
    <year>2007</year>  
    <price>229.99</price>  
  </book>  
    <book category="CHILDREN">  
    <title lang="cn">kkkkkkkk</title>  
    <author>k</author>  
    <year>2008</year>  
    <price>329.99</price>  
  </book>  
    <book category="CHILDREN">  
    <title lang="ru">llllllllll</title>  
    <author>l</author>  
    <year>2009</year>  
    <price>429.99</price>  
  </book>  
  <book category="WEB">  
    <title lang="en">Learning XML</title>  
    <author>Erik T. Ray</author>  
    <year>2003</year>  
    <price>39.95</price>  
  </book>  
</bookstore>'''

现在我想从具有 category="CHILDREN" 属性的书籍节点中获取 lang 属性的值,所以我这样做了:

xmlproc = etree.fromstring(xmldump.encode('utf-8'))
books = xmlproc.xpath("//*[@category='CHILDREN']")

我得到了一个包含 4 个元素的 books 列表,所以我继续遍历它们,以便从每个元素中获取 lang 属性值

for b in books:
    language = b.xpath("//title/@lang")
    language2 = b.xpath("//*/@lang")

结果是:language = ['en', 'en', 'es', 'cn', 'ru', 'en'] when in事实上,我期待 language = ['en'] 用于第一个循环,然后是 ['es'] 等等 ['cn'],最后是 ['ru'] 用于最后一个循环图书清单。

现在 language & language2 得到了一个包含 所有 en 属性的列表,这些属性来自我最初的 xmldump。所以我只从 b 元素 中请求属性,为什么我得到了整个属性列表? - b 元素是图书列表中的每个元素。

此外,正确的方法是什么,以便我可以获得任何特定属性。请注意,我还需要为每个特定的 b 元素找到子孙元素,因此我需要能够分离并循环遍历那些特定的 lxml 元素,而不是在初始 xmlproc 中.

b.xpath("//title/@lang") & b.xpath("//*/@lang") 双反斜杠将从 xml 中检索所有数据(而不是您的过滤结果)。只需删除它:

from lxml import etree

xmldump = '''<bookstore>  
  <book category="COOKING">  
    <title lang="en">Everyday Italian</title>  
    <author>Giada De Laurentiis</author>  
    <year>2005</year>  
    <price>30.00</price>  
  </book>  
  <book category="CHILDREN">  
    <title lang="en">ggggggg</title>  
    <author>g</author>  
    <year>2006</year>  
    <price>129.99</price>  
  </book>
    <book category="CHILDREN">  
    <title lang="es">hhhhhhh</title>  
    <author>h</author>  
    <year>2007</year>  
    <price>229.99</price>  
  </book>  
    <book category="CHILDREN">  
    <title lang="cn">kkkkkkkk</title>  
    <author>k</author>  
    <year>2008</year>  
    <price>329.99</price>  
  </book>  
    <book category="CHILDREN">  
    <title lang="ru">llllllllll</title>  
    <author>l</author>  
    <year>2009</year>  
    <price>429.99</price>  
  </book>  
  <book category="WEB">  
    <title lang="en">Learning XML</title>  
    <author>Erik T. Ray</author>  
    <year>2003</year>  
    <price>39.95</price>  
  </book>  
</bookstore>'''

xmlproc = etree.fromstring(xmldump.encode('utf-8'))
books = xmlproc.xpath("// *[@category='CHILDREN']")
for b in books:
    language = b.xpath("title/@lang")
    language2 = b.xpath("*/@lang")
    print(language)
    print(language2)

输出:

['en']
['en']
['es']
['es']
['cn']
['cn']
['ru']
['ru']

您可以将逻辑从 for 循环移动到 xpath 中:

languageArr = xmlproc.xpath("// *[@category='CHILDREN'] //title/@lang")    
print(languageArr)

language2Arr = xmlproc.xpath("// *[@category='CHILDREN'] //*/@lang")
print(language2Arr)

输出:

['en', 'es', 'cn', 'ru']
['en', 'es', 'cn', 'ru']