如何使用 python 从 xml 格式的响应中检索相同标签的所有值？

Question

我使用的是 DBpedia 的 Lookup api，它以 xml 格式返回响应，如下所示：

<ArrayOfResults>
    <Result>
        <Label>China</Label>
        <URI>http://dbpedia.org/resource/China</URI>
        <Description>China .... administrative regions of Hong Kong and Macau.</Description>
        <Classes>
            <Class>
                <Label>Place</Label>
                <URI>http://dbpedia.org/ontology/Place</URI>
            </Class>
            <Class>
                <Label>Country</Label>
                <URI>http://dbpedia.org/ontology/Country</URI>
            </Class>
        </Classes>
        <Categories>
            <Category>
                <URI>http://dbpedia.org/resource/Category:Member_states_of_the_United_Nations</URI>
            </Category>
            <Category>
                <URI>http://dbpedia.org/resource/Category:Republics</URI>
            </Category>
        </Categories>
        <Refcount>12789</Refcount>
    </Result>
    <Result>
        <Label>Theatre of China</Label>
        <URI>http://dbpedia.org/resource/Theatre_of_China</URI>
        <Description>Theatre of China ... the 20th century.</Description>
        <Classes/>
        <Categories>
            <Category>
                <URI>http://dbpedia.org/resource/Category:Asian_drama</URI>
            </Category>
            <Category>
                <URI>http://dbpedia.org/resource/Category:Chinese_performing_arts</URI>
            </Category>
        </Categories>
        <Refcount>23</Refcount>
    </Result>
</ArrayOfResults>

我把它缩短了。可以找到完整的回复 in this link

现在，我需要检索 <Label> 和 <URI> 标签下的所有值。

这是我目前所做的：

import requests
import xml.etree.ElementTree as ET

response = requests.get('https://lookup.dbpedia.org/api/search?query=China')
response_body = response.content

response_xml = ET.fromstring(response_body)

root = ET.fromstring(response_body)
for child in root:
    print(child.tag)
    for grandchild in child:
        print(f"\t {grandchild.tag}")
        label = grandchild.find('Label')
        uri = grandchild.find('URI')
        print(f"\t required label = {label}")
        print(f"\t required uri = {uri}")

但是label和uri的值在每种情况下都是None。我怎样才能解决这个问题，以便我可以获得 <Result> 的 <Label> 标签下的所有值（like China, Theater of China etc）和<URI>标签下呢？

Answer 1

你其实嵌套太深了。您需要在 child（这是一个 <Result> 元素）上调用 find：

for child in root:
    label = child.find('Label').text
    uri = child.find('URI').text

Answer 2

您好，我不知道您是否需要知道哪些 URL 连接到哪些标签，但这是获取所有 URL 的一种非常简单的方法

import requests

url = 'https://lookup.dbpedia.org/api/search?query=China'

soup = BeautifulSoup(requests.get(url).text,'xml').find('Result')

labels = [label.text for label in soup.find_all('Label')]

URI= [uri.text for uri in soup.find_all('URI')]

如何使用 python 从 xml 格式的响应中检索相同标签的所有值？

How to retrieve all values of same tag from a response in xml format using python?

python

python-3.x

xml

python-requests

elementtree