Python - 使用属性解析 XML 树
Python - Using attributes to parse XML tree
有很多关于如何使用树中的标签解析 XML 的示例,但是如果(如下例所示)许多标签具有相同的名称怎么办?
<SoccerFeed timestamp="20161221T144346+0000">
<SoccerDocument Type="SQUADS Latest">
<Team country="USA">
<Founded>1998</Founded>
<Name>Chicago Fire</Name>
<Player uID="p113757">
<Name>Patrick McLain</Name>
<Position>Goalkeeper</Position>
<Stat Type="first_name">Patrick</Stat>
<Stat Type="last_name">McLain</Stat>
<Stat Type="birth_date">1988-08-22</Stat>
<Stat Type="birth_place">Eau Claire</Stat>
<Stat Type="first_nationality">USA</Stat>
<Stat Type="weight">94</Stat>
<Stat Type="height">191</Stat>
<Stat Type="jersey_num">23</Stat>
<Stat Type="real_position">Goalkeeper</Stat>
<Stat Type="real_position_side">Unknown</Stat>
<Stat Type="join_date">2016-01-18</Stat>
<Stat Type="country">USA</Stat>
</Player>
</Team>
</SoccerDocument>
</SoccerFeed>
如果我只想解析带有 'Stat' 标签和 'first_name' 属性的元素,我该怎么做?
您可以在 XML
解析器中使用 BeautifulSoup
,就像这个例子:
from bs4 import BeautifulSoup as bs
data = '''<SoccerFeed timestamp="20161221T144346+0000">
<SoccerDocument Type="SQUADS Latest">
<Team country="USA">
<Founded>1998</Founded>
<Name>Chicago Fire</Name>
<Player uID="p113757">
<Name>Patrick McLain</Name>
<Position>Goalkeeper</Position>
<Stat Type="first_name">Patrick</Stat>
<Stat Type="last_name">McLain</Stat>
<Stat Type="birth_date">1988-08-22</Stat>
<Stat Type="birth_place">Eau Claire</Stat>
<Stat Type="first_nationality">USA</Stat>
<Stat Type="weight">94</Stat>
<Stat Type="height">191</Stat>
<Stat Type="jersey_num">23</Stat>
<Stat Type="real_position">Goalkeeper</Stat>
<Stat Type="real_position_side">Unknown</Stat>
<Stat Type="join_date">2016-01-18</Stat>
<Stat Type="country">USA</Stat>
</Player>
</Team>
</SoccerDocument>
</SoccerFeed>'''
sub = bs(data, 'xml')
# Find all the 'Stat' tags
stat_tags = sub.findAll('Stat')
for k in stat_tags:
# Extract the text between 'Stat' tags
print(k.text)
输出:
Patrick
McLain
1988-08-22
Eau Claire
USA
94
191
23
Goalkeeper
Unknown
2016-01-18
USA
使用 R 和 xml2
库:
library("xml2")
myxml<-read_xml('<SoccerFeed timestamp="20161221T144346+0000">
<SoccerDocument Type="SQUADS Latest">
<Team country="USA">
<Founded>1998</Founded>
<Name>Chicago Fire</Name>
<Player uID="p113757">
<Name>Patrick McLain</Name>
<Position>Goalkeeper</Position>
<Stat Type="first_name">Patrick</Stat>
<Stat Type="last_name">McLain</Stat>
<Stat Type="birth_date">1988-08-22</Stat>
<Stat Type="birth_place">Eau Claire</Stat>
<Stat Type="first_nationality">USA</Stat>
<Stat Type="weight">94</Stat>
<Stat Type="height">191</Stat>
<Stat Type="jersey_num">23</Stat>
<Stat Type="real_position">Goalkeeper</Stat>
<Stat Type="real_position_side">Unknown</Stat>
<Stat Type="join_date">2016-01-18</Stat>
<Stat Type="country">USA</Stat>
</Player>
</Team>
</SoccerDocument>
</SoccerFeed>')
#get all of the Stat nodes
statnodes<-xml_nodes(myxml, "Stat")
#filter for first_name node
firstname<- statnodes[xml_attr(statnodes, "Type" )== "first_name"]
#get text value
xml_text(firstname)
使用 ElementTree:
for firstnames in root.findall('Team/Player/Stat[@type="first_name"]')
print(firstnames.attrib)
完整的 XPath 语法:https://docs.python.org/3.6/library/xml.etree.elementtree.html#supported-xpath-syntax
有很多关于如何使用树中的标签解析 XML 的示例,但是如果(如下例所示)许多标签具有相同的名称怎么办?
<SoccerFeed timestamp="20161221T144346+0000">
<SoccerDocument Type="SQUADS Latest">
<Team country="USA">
<Founded>1998</Founded>
<Name>Chicago Fire</Name>
<Player uID="p113757">
<Name>Patrick McLain</Name>
<Position>Goalkeeper</Position>
<Stat Type="first_name">Patrick</Stat>
<Stat Type="last_name">McLain</Stat>
<Stat Type="birth_date">1988-08-22</Stat>
<Stat Type="birth_place">Eau Claire</Stat>
<Stat Type="first_nationality">USA</Stat>
<Stat Type="weight">94</Stat>
<Stat Type="height">191</Stat>
<Stat Type="jersey_num">23</Stat>
<Stat Type="real_position">Goalkeeper</Stat>
<Stat Type="real_position_side">Unknown</Stat>
<Stat Type="join_date">2016-01-18</Stat>
<Stat Type="country">USA</Stat>
</Player>
</Team>
</SoccerDocument>
</SoccerFeed>
如果我只想解析带有 'Stat' 标签和 'first_name' 属性的元素,我该怎么做?
您可以在 XML
解析器中使用 BeautifulSoup
,就像这个例子:
from bs4 import BeautifulSoup as bs
data = '''<SoccerFeed timestamp="20161221T144346+0000">
<SoccerDocument Type="SQUADS Latest">
<Team country="USA">
<Founded>1998</Founded>
<Name>Chicago Fire</Name>
<Player uID="p113757">
<Name>Patrick McLain</Name>
<Position>Goalkeeper</Position>
<Stat Type="first_name">Patrick</Stat>
<Stat Type="last_name">McLain</Stat>
<Stat Type="birth_date">1988-08-22</Stat>
<Stat Type="birth_place">Eau Claire</Stat>
<Stat Type="first_nationality">USA</Stat>
<Stat Type="weight">94</Stat>
<Stat Type="height">191</Stat>
<Stat Type="jersey_num">23</Stat>
<Stat Type="real_position">Goalkeeper</Stat>
<Stat Type="real_position_side">Unknown</Stat>
<Stat Type="join_date">2016-01-18</Stat>
<Stat Type="country">USA</Stat>
</Player>
</Team>
</SoccerDocument>
</SoccerFeed>'''
sub = bs(data, 'xml')
# Find all the 'Stat' tags
stat_tags = sub.findAll('Stat')
for k in stat_tags:
# Extract the text between 'Stat' tags
print(k.text)
输出:
Patrick
McLain
1988-08-22
Eau Claire
USA
94
191
23
Goalkeeper
Unknown
2016-01-18
USA
使用 R 和 xml2
库:
library("xml2")
myxml<-read_xml('<SoccerFeed timestamp="20161221T144346+0000">
<SoccerDocument Type="SQUADS Latest">
<Team country="USA">
<Founded>1998</Founded>
<Name>Chicago Fire</Name>
<Player uID="p113757">
<Name>Patrick McLain</Name>
<Position>Goalkeeper</Position>
<Stat Type="first_name">Patrick</Stat>
<Stat Type="last_name">McLain</Stat>
<Stat Type="birth_date">1988-08-22</Stat>
<Stat Type="birth_place">Eau Claire</Stat>
<Stat Type="first_nationality">USA</Stat>
<Stat Type="weight">94</Stat>
<Stat Type="height">191</Stat>
<Stat Type="jersey_num">23</Stat>
<Stat Type="real_position">Goalkeeper</Stat>
<Stat Type="real_position_side">Unknown</Stat>
<Stat Type="join_date">2016-01-18</Stat>
<Stat Type="country">USA</Stat>
</Player>
</Team>
</SoccerDocument>
</SoccerFeed>')
#get all of the Stat nodes
statnodes<-xml_nodes(myxml, "Stat")
#filter for first_name node
firstname<- statnodes[xml_attr(statnodes, "Type" )== "first_name"]
#get text value
xml_text(firstname)
使用 ElementTree:
for firstnames in root.findall('Team/Player/Stat[@type="first_name"]')
print(firstnames.attrib)
完整的 XPath 语法:https://docs.python.org/3.6/library/xml.etree.elementtree.html#supported-xpath-syntax