如何使用 python 从 xml 格式的响应中检索相同标签的所有值?
How to retrieve all values of same tag from a response in xml format using python?
我使用的是 DBpedia 的 Lookup api,它以 xml 格式返回响应,如下所示:
<ArrayOfResults>
<Result>
<Label>China</Label>
<URI>http://dbpedia.org/resource/China</URI>
<Description>China .... administrative regions of Hong Kong and Macau.</Description>
<Classes>
<Class>
<Label>Place</Label>
<URI>http://dbpedia.org/ontology/Place</URI>
</Class>
<Class>
<Label>Country</Label>
<URI>http://dbpedia.org/ontology/Country</URI>
</Class>
</Classes>
<Categories>
<Category>
<URI>http://dbpedia.org/resource/Category:Member_states_of_the_United_Nations</URI>
</Category>
<Category>
<URI>http://dbpedia.org/resource/Category:Republics</URI>
</Category>
</Categories>
<Refcount>12789</Refcount>
</Result>
<Result>
<Label>Theatre of China</Label>
<URI>http://dbpedia.org/resource/Theatre_of_China</URI>
<Description>Theatre of China ... the 20th century.</Description>
<Classes/>
<Categories>
<Category>
<URI>http://dbpedia.org/resource/Category:Asian_drama</URI>
</Category>
<Category>
<URI>http://dbpedia.org/resource/Category:Chinese_performing_arts</URI>
</Category>
</Categories>
<Refcount>23</Refcount>
</Result>
</ArrayOfResults>
我把它缩短了。可以找到完整的回复 in this link
现在,我需要检索 <Label>
和 <URI>
标签下的所有值。
这是我目前所做的:
import requests
import xml.etree.ElementTree as ET
response = requests.get('https://lookup.dbpedia.org/api/search?query=China')
response_body = response.content
response_xml = ET.fromstring(response_body)
root = ET.fromstring(response_body)
for child in root:
print(child.tag)
for grandchild in child:
print(f"\t {grandchild.tag}")
label = grandchild.find('Label')
uri = grandchild.find('URI')
print(f"\t required label = {label}")
print(f"\t required uri = {uri}")
但是label
和uri
的值在每种情况下都是None。我怎样才能解决这个问题,以便我可以获得 <Result>
的 <Label>
标签下的所有值(like China, Theater of China etc)和<URI>
标签下呢?
你其实嵌套太深了。您需要在 child
(这是一个 <Result>
元素)上调用 find
:
for child in root:
label = child.find('Label').text
uri = child.find('URI').text
您好,我不知道您是否需要知道哪些 URL 连接到哪些标签,但这是获取所有 URL 的一种非常简单的方法
import requests
url = 'https://lookup.dbpedia.org/api/search?query=China'
soup = BeautifulSoup(requests.get(url).text,'xml').find('Result')
labels = [label.text for label in soup.find_all('Label')]
URI= [uri.text for uri in soup.find_all('URI')]
我使用的是 DBpedia 的 Lookup api,它以 xml 格式返回响应,如下所示:
<ArrayOfResults>
<Result>
<Label>China</Label>
<URI>http://dbpedia.org/resource/China</URI>
<Description>China .... administrative regions of Hong Kong and Macau.</Description>
<Classes>
<Class>
<Label>Place</Label>
<URI>http://dbpedia.org/ontology/Place</URI>
</Class>
<Class>
<Label>Country</Label>
<URI>http://dbpedia.org/ontology/Country</URI>
</Class>
</Classes>
<Categories>
<Category>
<URI>http://dbpedia.org/resource/Category:Member_states_of_the_United_Nations</URI>
</Category>
<Category>
<URI>http://dbpedia.org/resource/Category:Republics</URI>
</Category>
</Categories>
<Refcount>12789</Refcount>
</Result>
<Result>
<Label>Theatre of China</Label>
<URI>http://dbpedia.org/resource/Theatre_of_China</URI>
<Description>Theatre of China ... the 20th century.</Description>
<Classes/>
<Categories>
<Category>
<URI>http://dbpedia.org/resource/Category:Asian_drama</URI>
</Category>
<Category>
<URI>http://dbpedia.org/resource/Category:Chinese_performing_arts</URI>
</Category>
</Categories>
<Refcount>23</Refcount>
</Result>
</ArrayOfResults>
我把它缩短了。可以找到完整的回复 in this link
现在,我需要检索 <Label>
和 <URI>
标签下的所有值。
这是我目前所做的:
import requests
import xml.etree.ElementTree as ET
response = requests.get('https://lookup.dbpedia.org/api/search?query=China')
response_body = response.content
response_xml = ET.fromstring(response_body)
root = ET.fromstring(response_body)
for child in root:
print(child.tag)
for grandchild in child:
print(f"\t {grandchild.tag}")
label = grandchild.find('Label')
uri = grandchild.find('URI')
print(f"\t required label = {label}")
print(f"\t required uri = {uri}")
但是label
和uri
的值在每种情况下都是None。我怎样才能解决这个问题,以便我可以获得 <Result>
的 <Label>
标签下的所有值(like China, Theater of China etc)和<URI>
标签下呢?
你其实嵌套太深了。您需要在 child
(这是一个 <Result>
元素)上调用 find
:
for child in root:
label = child.find('Label').text
uri = child.find('URI').text
您好,我不知道您是否需要知道哪些 URL 连接到哪些标签,但这是获取所有 URL 的一种非常简单的方法
import requests
url = 'https://lookup.dbpedia.org/api/search?query=China'
soup = BeautifulSoup(requests.get(url).text,'xml').find('Result')
labels = [label.text for label in soup.find_all('Label')]
URI= [uri.text for uri in soup.find_all('URI')]