XML 用 ElementTree 解析
XML parsing with ElementTree
我想知道是否可以使用标签中的现有文本来获取 XML 树中下一个标签上的文本,考虑到以下 XML 文件:
...
<link>
<description>document</description>
<url>https://www.../doc/file.pdf</url>
</link>
<link>
<description>document1</description>
<url>https://www.../doc1/file1.pdf</url>
</link>
<link>
<description>document2</description>
<url>https://www.../doc2/file2.pdf</url>
</link>
...
for item in tree.findall('.//subChapter//document//link//'):
if item.tag == 'description':
if item.text == 'document':
**THEN GET THE TEXT ON THE NEXT TAG <url>...</url>**
**e.g: https://www.../doc/file.pdf**
print(NEXT TAG)
elif item.text == 'document1':
**THEN GET THE TEXT ON THE NEXT TAG <url>...</url>**
**e.g: https://www..../doc/file1.pdf**
print(NEXT TAG)
elif item.text == 'document2':
**THEN GET THE TEXT ON THE NEXT TAG <url>...</url>**
**e.g: https://www.../doc/file2.pdf**
print(NEXT TAG)
谢谢!
使用 lxml 解析器时,可以通过使用 getnext() 函数来实现。使用 ElementTree 时,这可以通过更改循环来实现:
# iterate over link elements
for link in tree.findall('.//subChapter//document/link'):
# keep reference to link child elements
children = list(link)
for item in children:
if item.tag == 'description':
if item.text == 'document':
# acess necessary link child by index
next_tag = children[1]
print(next_tag.text)
我想知道是否可以使用标签中的现有文本来获取 XML 树中下一个标签上的文本,考虑到以下 XML 文件:
...
<link>
<description>document</description>
<url>https://www.../doc/file.pdf</url>
</link>
<link>
<description>document1</description>
<url>https://www.../doc1/file1.pdf</url>
</link>
<link>
<description>document2</description>
<url>https://www.../doc2/file2.pdf</url>
</link>
...
for item in tree.findall('.//subChapter//document//link//'):
if item.tag == 'description':
if item.text == 'document':
**THEN GET THE TEXT ON THE NEXT TAG <url>...</url>**
**e.g: https://www.../doc/file.pdf**
print(NEXT TAG)
elif item.text == 'document1':
**THEN GET THE TEXT ON THE NEXT TAG <url>...</url>**
**e.g: https://www..../doc/file1.pdf**
print(NEXT TAG)
elif item.text == 'document2':
**THEN GET THE TEXT ON THE NEXT TAG <url>...</url>**
**e.g: https://www.../doc/file2.pdf**
print(NEXT TAG)
谢谢!
使用 lxml 解析器时,可以通过使用 getnext() 函数来实现。使用 ElementTree 时,这可以通过更改循环来实现:
# iterate over link elements
for link in tree.findall('.//subChapter//document/link'):
# keep reference to link child elements
children = list(link)
for item in children:
if item.tag == 'description':
if item.text == 'document':
# acess necessary link child by index
next_tag = children[1]
print(next_tag.text)