XML 用 ElementTree 解析

Question

我想知道是否可以使用标签中的现有文本来获取 XML 树中下一个标签上的文本，考虑到以下 XML 文件：

...
<link>
   <description>document</description>
   <url>https://www.../doc/file.pdf</url>
</link>
<link>
   <description>document1</description>
   <url>https://www.../doc1/file1.pdf</url>
</link>
<link>
   <description>document2</description>
   <url>https://www.../doc2/file2.pdf</url>
</link>
...                     
    
    for item in tree.findall('.//subChapter//document//link//'):
        if item.tag == 'description':
            if item.text == 'document':
                **THEN GET THE TEXT ON THE NEXT TAG <url>...</url>**
                **e.g: https://www.../doc/file.pdf**
                print(NEXT TAG)
            elif item.text == 'document1':
                **THEN GET THE TEXT ON THE NEXT TAG <url>...</url>**
                **e.g: https://www..../doc/file1.pdf**
                print(NEXT TAG)
            elif item.text == 'document2':
                **THEN GET THE TEXT ON THE NEXT TAG <url>...</url>**
                **e.g: https://www.../doc/file2.pdf**
                print(NEXT TAG)

谢谢！

Answer 1

使用 lxml 解析器时，可以通过使用 getnext() 函数来实现。使用 ElementTree 时，这可以通过更改循环来实现：

# iterate over link elements
for link in tree.findall('.//subChapter//document/link'):
    # keep reference to link child elements
    children = list(link)
    for item in children:
        if item.tag == 'description':
            if item.text == 'document':
                # acess necessary link child by index
                next_tag = children[1]
                print(next_tag.text)

XML 用 ElementTree 解析

XML parsing with ElementTree

xml

elementtree

python-3.x