如何使用 python 解析 XML children
How to parse XML children using python
我从网站上解析了 XML,我发现它有两个分支 (children),
如何将两个分支分成两个字典列表,
到目前为止,这是我的代码:
import pandas as pd
import xml.etree.ElementTree as ET
import requests
url = "http://cs.stir.ac.uk/~soh/BD2spring2022/assignmentdata.php"
params = {'data':'spurpyr'}
response = requests.get (url, params)
tree = response.content
#extract the root element as separate variable, and display the root tag.
root = ET.fromstring(tree)
print(root.tag)
#Get attributes of root
root_attr = root.attrib
print(root_attr)
#Finding children of root
for child in root:
print(child.tag, child.attrib)
#extract the two children of the root element into another two separate variables, and display their tags as well
child_dict = []
for child in root:
child_dict.append(child.tag)
tweets_branch = child_dict[0]
cities_branch = child_dict[1]
#the elements in the entire tree
[elem.tag for elem in root.iter()]
#specify both the encoding and decoding of the document you are displaying as the string
print(ET.tostring(root, encoding='utf8').decode('utf8'))
使用beautifulsoup
模块。要将推文和城市解析为字典列表,您可以使用此示例:
import requests
from bs4 import BeautifulSoup
url = "http://cs.stir.ac.uk/~soh/BD2spring2022/assignmentdata.php"
params = {"data": "spurpyr"}
soup = BeautifulSoup(requests.get(url, params=params).content, "xml")
tweets = []
for t in soup.select("tweets > tweet"):
tweets.append({"id": t["id"], **{x.name: x.text for x in t.find_all()}})
cities = []
for c in soup.select("cities > city"):
cities.append({"id": c["id"], **{x.name: x.text for x in c.find_all()}})
print(tweets)
print(cities)
打印:
[
{
"id": "16620625 5686",
"Name": "Kenyon Conley",
"Phone": "0327 103 9485",
"Email": "malesuada@lobortisClassaptent.edu",
"Location": "45.5333, -73.2833",
"GenderID": "male",
"Tweet": "#FollowFriday @DanielleMorrill - She's with @Seattle20 and @Twilio. Also fun to talk to. #entrepreneur",
"City": "Saint-Basile-le-Grand",
"Country": "Canada",
"Age": "34",
},
{
"id": "16310427-5502",
"Name": "Griffin Norton",
"Phone": "0306 178 7917",
"Email": "in.dolor.Fusce@necmalesuadaut.ca",
"Location": "52.0000, 84.9833",
"GenderID": "male",
"Tweet": "!!!Veryy Bored!!! ~~Craving Million's Of MilkShakes~~",
"City": "Belokurikha",
"Country": "Russia",
"Age": "33",
},
...
我从网站上解析了 XML,我发现它有两个分支 (children),
如何将两个分支分成两个字典列表,
到目前为止,这是我的代码:
import pandas as pd
import xml.etree.ElementTree as ET
import requests
url = "http://cs.stir.ac.uk/~soh/BD2spring2022/assignmentdata.php"
params = {'data':'spurpyr'}
response = requests.get (url, params)
tree = response.content
#extract the root element as separate variable, and display the root tag.
root = ET.fromstring(tree)
print(root.tag)
#Get attributes of root
root_attr = root.attrib
print(root_attr)
#Finding children of root
for child in root:
print(child.tag, child.attrib)
#extract the two children of the root element into another two separate variables, and display their tags as well
child_dict = []
for child in root:
child_dict.append(child.tag)
tweets_branch = child_dict[0]
cities_branch = child_dict[1]
#the elements in the entire tree
[elem.tag for elem in root.iter()]
#specify both the encoding and decoding of the document you are displaying as the string
print(ET.tostring(root, encoding='utf8').decode('utf8'))
使用beautifulsoup
模块。要将推文和城市解析为字典列表,您可以使用此示例:
import requests
from bs4 import BeautifulSoup
url = "http://cs.stir.ac.uk/~soh/BD2spring2022/assignmentdata.php"
params = {"data": "spurpyr"}
soup = BeautifulSoup(requests.get(url, params=params).content, "xml")
tweets = []
for t in soup.select("tweets > tweet"):
tweets.append({"id": t["id"], **{x.name: x.text for x in t.find_all()}})
cities = []
for c in soup.select("cities > city"):
cities.append({"id": c["id"], **{x.name: x.text for x in c.find_all()}})
print(tweets)
print(cities)
打印:
[
{
"id": "16620625 5686",
"Name": "Kenyon Conley",
"Phone": "0327 103 9485",
"Email": "malesuada@lobortisClassaptent.edu",
"Location": "45.5333, -73.2833",
"GenderID": "male",
"Tweet": "#FollowFriday @DanielleMorrill - She's with @Seattle20 and @Twilio. Also fun to talk to. #entrepreneur",
"City": "Saint-Basile-le-Grand",
"Country": "Canada",
"Age": "34",
},
{
"id": "16310427-5502",
"Name": "Griffin Norton",
"Phone": "0306 178 7917",
"Email": "in.dolor.Fusce@necmalesuadaut.ca",
"Location": "52.0000, 84.9833",
"GenderID": "male",
"Tweet": "!!!Veryy Bored!!! ~~Craving Million's Of MilkShakes~~",
"City": "Belokurikha",
"Country": "Russia",
"Age": "33",
},
...