XML 正在解析 - findall() 列表为空

XML parsing - findall() list comes up empty

卡在处理 URL 和 XML 解析的作业上。我已经得到了数据,但似乎无法让 findall() 工作。我知道一旦我可以让 findall() 工作,我就会有一个循环列表。任何见解都会很棒,如果可能的话,希望得到温和的推动而不是直接的回答。谢谢!

import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
fhand = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')

raw_data = fhand.read().decode()
xml_data = ET.fromstring(raw_data)
lst = xml_data.findall('name')
print(lst)

findall 不是递归的,这意味着它不会找到 node/element 如果它不在你调用 findall 的元素的正下方(如果不使用 xpath,那就是) .

而是使用 iter:

import urllib.request
import xml.etree.ElementTree as ET

fhand = urllib.request.urlopen('http://py4e-data.dr-chuck.net/comments_42.xml')

raw_data = fhand.read().decode()
xml_data = ET.fromstring(raw_data)
for name_node in xml_data.iter('name'):
    print(name_node.text)

findallxpath:

xml_data.findall('comments/comment/name')

两者都会输出

Romina
Laurie
Bayli
Siyona
Taisha
Alanda
Ameelia
Prasheeta
Asif
Risa
Zi
Danyil
Ediomi
Barry
Lance
Hattie
Mathu
Bowie
Samara
Uchenna
Shauni
Georgia
Rivan
Kenan
Hassan
Isma
Samanthalee
Alexa
Caine
Grady
Anne
Rihan
Alexei
Indie
Rhuairidh
Annoushka
Kenzi
Shahd
Irvine
Carys
Skye
Atiya
Rohan
Nuala
Maram
Carlo
Japleen
Breeanna
Zaaine
Inika

您可以使用请求库和 BeautifulSoup 为此:

import requests
from bs4 import BeautifulSoup

response = requests.get('http://py4e-data.dr-chuck.net/comments_42.xml')

soup = BeautifulSoup(response.text, 'html.parser')
names = soup.find_all('name')
for name in names:
    print(name.text)

输出:

Romina
Laurie
Bayli
Siyona
Taisha
Alanda
Ameelia
Prasheeta
Asif
Risa
Zi
Danyil
Ediomi
Barry
Lance
Hattie
Mathu
Bowie
Samara
Uchenna
Shauni
Georgia
Rivan
Kenan
Hassan
Isma
Samanthalee
Alexa
Caine
Grady
Anne
Rihan
Alexei
Indie
Rhuairidh
Annoushka
Kenzi
Shahd
Irvine
Carys
Skye
Atiya
Rohan
Nuala
Maram
Carlo
Japleen
Breeanna
Zaaine
Inika