如何将 XML 数据（从 link 中检索到）转换为类似字节的对象？

Question

我是 Whosebug 的新手，最近开始使用 python 进行网络抓取。如问题所述，我无法将从 link 检索到的 XML 数据转换为类似字节的对象。

我想我检索到 XML 数据没问题（图 1）。但是，每当我试图将它转换成树时，就会发生错误并提示 “需要一个类似字节的对象”（图 2）

代码：

import xml.etree.ElementTree as ET
from bs4 import BeautifulSoup
from urllib.request import urlopen
import ssl


# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')
html = urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, "html.parser")
print(soup)

#tree = ET.fromstring(soup)
#print(tree)
#for i in soup:
#  print('Name:', tree.find('name').text)
#  print('Attr:', tree.find('comments').text)

输入Link:http://py4e-data.dr-chuck.net/comments_42.xml

图片-1Code is running fine after retrieving the data from the link

图片-2Error is occuring

Answer 1

您必须传递 url 本身的内容。而且，实际上，你并不真的需要 bs4 但你可能想看看 xmltodictmodule.

试试这个：

import xmltodict
import xml.etree.ElementTree as ET
from urllib.request import urlopen


html = urlopen("http://py4e-data.dr-chuck.net/comments_42.xml").read()
tree = ET.fromstring(html)
print(xmltodict.parse(html)['commentinfo']['note'])

输出：

This file contains the sample data for testing

编辑：根据您的方法，这里是如何遍历树元素

import xml.etree.ElementTree as ET
from urllib.request import urlopen


html = urlopen("http://py4e-data.dr-chuck.net/comments_42.xml").read()

tree = ET.fromstring(html)

for item in tree.iter():
    if item.tag == "name":
        print(f"{item.tag}: {item.text.strip()}")

fromstring() 方法将 Element 转换为 ElementTree，这允许我们使用 iter() 方法遍历所有节点。

这输出：

name: Romina
name: Laurie
name: Bayli
name: Siyona
name: Taisha
name: Alanda
...

您还可以通过调用 findall() 方法来访问特定元素。例如，要获取所有名称，请执行以下操作：

names = tree.findall(".//name")
print([name.text for name in names])

输出：

['Romina', 'Laurie', 'Bayli', 'Siyona', 'Taisha', 'Alanda', 'Ameelia'...]

如何将 XML 数据（从 link 中检索到）转换为类似字节的对象？

How to convert XML data (Which was retrieved from a link) into a byte like object?

python

xml

xml-parsing

web-scraping