如何使用 python 解析 XML children

Question

我从网站上解析了 XML，我发现它有两个分支 (children)，

如何将两个分支分成两个字典列表，

到目前为止，这是我的代码：

import pandas as pd
import xml.etree.ElementTree as ET
import requests
url = "http://cs.stir.ac.uk/~soh/BD2spring2022/assignmentdata.php"
params = {'data':'spurpyr'}
response = requests.get (url, params)
tree = response.content

#extract the root element as separate variable, and display the root tag.
root = ET.fromstring(tree)
print(root.tag)

#Get attributes of root
root_attr = root.attrib
print(root_attr)

#Finding children of root
for child in root:
    print(child.tag, child.attrib)

#extract the two children of the root element into another two separate variables, and display their tags as well
child_dict = []
for child in root:
    child_dict.append(child.tag)
    
tweets_branch = child_dict[0]
cities_branch = child_dict[1]

#the elements in the entire tree
[elem.tag for elem in root.iter()]

#specify both the encoding and decoding of the document you are displaying as the string
print(ET.tostring(root, encoding='utf8').decode('utf8'))

Answer 1

使用beautifulsoup模块。要将推文和城市解析为字典列表，您可以使用此示例：

import requests
from bs4 import BeautifulSoup

url = "http://cs.stir.ac.uk/~soh/BD2spring2022/assignmentdata.php"
params = {"data": "spurpyr"}

soup = BeautifulSoup(requests.get(url, params=params).content, "xml")

tweets = []
for t in soup.select("tweets > tweet"):
    tweets.append({"id": t["id"], **{x.name: x.text for x in t.find_all()}})

cities = []
for c in soup.select("cities > city"):
    cities.append({"id": c["id"], **{x.name: x.text for x in c.find_all()}})

print(tweets)
print(cities)

打印：

[
    {
        "id": "16620625 5686",
        "Name": "Kenyon Conley",
        "Phone": "0327 103 9485",
        "Email": "malesuada@lobortisClassaptent.edu",
        "Location": "45.5333, -73.2833",
        "GenderID": "male",
        "Tweet": "#FollowFriday @DanielleMorrill - She's with @Seattle20 and @Twilio. Also fun to talk to.  #entrepreneur",
        "City": "Saint-Basile-le-Grand",
        "Country": "Canada",
        "Age": "34",
    },
    {
        "id": "16310427-5502",
        "Name": "Griffin Norton",
        "Phone": "0306 178 7917",
        "Email": "in.dolor.Fusce@necmalesuadaut.ca",
        "Location": "52.0000, 84.9833",
        "GenderID": "male",
        "Tweet": "!!!Veryy Bored!!!  ~~Craving Million's Of MilkShakes~~",
        "City": "Belokurikha",
        "Country": "Russia",
        "Age": "33",
    },

...

如何使用 python 解析 XML children

How to parse XML children using python

python

beautifulsoup

xml-parsing