使用 xpath 循环遍历特定 lxml 元素时获取完整的属性列表

Question

让我们考虑以下 xml:

from lxml import etree

xmldump = '''<bookstore>  
  <book category="COOKING">  
    <title lang="en">Everyday Italian</title>  
    <author>Giada De Laurentiis</author>  
    <year>2005</year>  
    <price>30.00</price>  
  </book>  
  <book category="CHILDREN">  
    <title lang="en">ggggggg</title>  
    <author>g</author>  
    <year>2006</year>  
    <price>129.99</price>  
  </book>
    <book category="CHILDREN">  
    <title lang="es">hhhhhhh</title>  
    <author>h</author>  
    <year>2007</year>  
    <price>229.99</price>  
  </book>  
    <book category="CHILDREN">  
    <title lang="cn">kkkkkkkk</title>  
    <author>k</author>  
    <year>2008</year>  
    <price>329.99</price>  
  </book>  
    <book category="CHILDREN">  
    <title lang="ru">llllllllll</title>  
    <author>l</author>  
    <year>2009</year>  
    <price>429.99</price>  
  </book>  
  <book category="WEB">  
    <title lang="en">Learning XML</title>  
    <author>Erik T. Ray</author>  
    <year>2003</year>  
    <price>39.95</price>  
  </book>  
</bookstore>'''

现在我想从具有 category="CHILDREN" 属性的书籍节点中获取 lang 属性的值，所以我这样做了：

xmlproc = etree.fromstring(xmldump.encode('utf-8'))
books = xmlproc.xpath("//*[@category='CHILDREN']")

我得到了一个包含 4 个元素的 books 列表，所以我继续遍历它们，以便从每个元素中获取 lang 属性值

for b in books:
    language = b.xpath("//title/@lang")
    language2 = b.xpath("//*/@lang")

结果是：language = ['en', 'en', 'es', 'cn', 'ru', 'en'] when in事实上，我期待 language = ['en'] 用于第一个循环，然后是 ['es'] 等等 ['cn']，最后是 ['ru'] 用于最后一个循环图书清单。

现在 language & language2 得到了一个包含所有 en 属性的列表，这些属性来自我最初的 xmldump。所以我只从 b 元素中请求属性，为什么我得到了整个属性列表？ - b 元素是图书列表中的每个元素。

此外，正确的方法是什么，以便我可以获得任何特定属性。请注意，我还需要为每个特定的 b 元素找到子孙元素，因此我需要能够分离并循环遍历那些特定的 lxml 元素，而不是在初始 xmlproc 中.

Answer 1

b.xpath("//title/@lang") & b.xpath("//*/@lang") 双反斜杠将从 xml 中检索所有数据（而不是您的过滤结果）。只需删除它：

from lxml import etree

xmldump = '''<bookstore>  
  <book category="COOKING">  
    <title lang="en">Everyday Italian</title>  
    <author>Giada De Laurentiis</author>  
    <year>2005</year>  
    <price>30.00</price>  
  </book>  
  <book category="CHILDREN">  
    <title lang="en">ggggggg</title>  
    <author>g</author>  
    <year>2006</year>  
    <price>129.99</price>  
  </book>
    <book category="CHILDREN">  
    <title lang="es">hhhhhhh</title>  
    <author>h</author>  
    <year>2007</year>  
    <price>229.99</price>  
  </book>  
    <book category="CHILDREN">  
    <title lang="cn">kkkkkkkk</title>  
    <author>k</author>  
    <year>2008</year>  
    <price>329.99</price>  
  </book>  
    <book category="CHILDREN">  
    <title lang="ru">llllllllll</title>  
    <author>l</author>  
    <year>2009</year>  
    <price>429.99</price>  
  </book>  
  <book category="WEB">  
    <title lang="en">Learning XML</title>  
    <author>Erik T. Ray</author>  
    <year>2003</year>  
    <price>39.95</price>  
  </book>  
</bookstore>'''

xmlproc = etree.fromstring(xmldump.encode('utf-8'))
books = xmlproc.xpath("// *[@category='CHILDREN']")
for b in books:
    language = b.xpath("title/@lang")
    language2 = b.xpath("*/@lang")
    print(language)
    print(language2)

输出：

['en']
['en']
['es']
['es']
['cn']
['cn']
['ru']
['ru']

您可以将逻辑从 for 循环移动到 xpath 中：

languageArr = xmlproc.xpath("// *[@category='CHILDREN'] //title/@lang")    
print(languageArr)

language2Arr = xmlproc.xpath("// *[@category='CHILDREN'] //*/@lang")
print(language2Arr)

输出：

['en', 'es', 'cn', 'ru']
['en', 'es', 'cn', 'ru']

使用 xpath 循环遍历特定 lxml 元素时获取完整的属性列表

Getting complete list of attributes while looping through particular lxml elements when using xpath

xml

lxml

python-3.x