正在使用元素树 python3 解析 XML 文件
Parsing XML file using element tree python3
我想解析下面给出的 XML 文件片段以提取视点标记及其属性名称。我还想创建一个 table 来将提取的数据制成表格。
我的XML文件片段:
<windows source-height='51'>
<window class='dashboard' maximized='true' name='Figure 8-59'>
<viewpoints>
<viewpoint name='Good Filter Design'>
<zoom type='entire-view' />
<geo-search-visibility value='1' />
</viewpoint>
<viewpoint name='Poor Filter Design'>
<zoom type='entire-view' />
<geo-search-visibility value='1' />
</viewpoint>
</viewpoints>
<active id='-1' />
</window>
<window class='dashboard' name='Figure 8-60 thought 8-65'>
<viewpoints>
<viewpoint name='Heat Map'>
<zoom type='entire-view' />
<geo-search-visibility value='1' />
</viewpoint>
<viewpoint name='Lightbulb'>
<zoom type='entire-view' />
<geo-search-visibility value='1' />
</viewpoint>
<viewpoint name='Sales Histogram'>
<zoom type='entire-view' />
<geo-search-visibility value='1' />
</viewpoint>
</viewpoints>
<active id='-1' />
</window>
</windows>
我想将 "good filter design"
和 "poor filter design"
提取并保留在一行中,其余三个视点名称保留在第二行中。
我的尝试:
root = getroot('example.xml')
for i in root.findall('windows/window/viewpoints/viewpoint'):
print(i.get('name'))
如果你能用beautifulsoup就这么简单
from bs4 import BeautifulSoup
#xml = """your xml"""
soup = BeautifulSoup(xml, 'lxml')
names = [viewpt["name"] for viewpt in soup.find_all('viewpoint')]
这会给每个名为 'viewpoint'
的标签
如果你只想嵌套一个,使用这个:
names = [viewpoint["name"]
for windows in soup.find_all('windows')
for window in windows.find_all("window")
for viewpoints in window.find_all("viewpoints")
for viewpoint in viewpoints.find_all("viewpoint")]
在你的情况下,两者都会给出:
Out[18]:
['Good Filter Design',
'Poor Filter Design',
'Heat Map',
'Lightbulb',
'Sales Histogram']
使用 elementtree 应该很容易。我不知道 getroot()
到底做了什么,但如果它确实是 XML 文档的 return 根元素,那么你不应该在 [=13] 中提到 window
=]参数:
>>> from xml.etree import ElementTree as ET
>>> raw = '''your XML string'''
>>> root = ET.fromstring(raw)
>>> for v in root.findall('window/viewpoints'):
... print([a.get('name') for a in v.findall('viewpoint')])
...
['Good Filter Design', 'Poor Filter Design']
['Heat Map', 'Lightbulb', 'Sales Histogram']
我想解析下面给出的 XML 文件片段以提取视点标记及其属性名称。我还想创建一个 table 来将提取的数据制成表格。
我的XML文件片段:
<windows source-height='51'>
<window class='dashboard' maximized='true' name='Figure 8-59'>
<viewpoints>
<viewpoint name='Good Filter Design'>
<zoom type='entire-view' />
<geo-search-visibility value='1' />
</viewpoint>
<viewpoint name='Poor Filter Design'>
<zoom type='entire-view' />
<geo-search-visibility value='1' />
</viewpoint>
</viewpoints>
<active id='-1' />
</window>
<window class='dashboard' name='Figure 8-60 thought 8-65'>
<viewpoints>
<viewpoint name='Heat Map'>
<zoom type='entire-view' />
<geo-search-visibility value='1' />
</viewpoint>
<viewpoint name='Lightbulb'>
<zoom type='entire-view' />
<geo-search-visibility value='1' />
</viewpoint>
<viewpoint name='Sales Histogram'>
<zoom type='entire-view' />
<geo-search-visibility value='1' />
</viewpoint>
</viewpoints>
<active id='-1' />
</window>
</windows>
我想将 "good filter design"
和 "poor filter design"
提取并保留在一行中,其余三个视点名称保留在第二行中。
我的尝试:
root = getroot('example.xml')
for i in root.findall('windows/window/viewpoints/viewpoint'):
print(i.get('name'))
如果你能用beautifulsoup就这么简单
from bs4 import BeautifulSoup
#xml = """your xml"""
soup = BeautifulSoup(xml, 'lxml')
names = [viewpt["name"] for viewpt in soup.find_all('viewpoint')]
这会给每个名为 'viewpoint'
的标签如果你只想嵌套一个,使用这个:
names = [viewpoint["name"]
for windows in soup.find_all('windows')
for window in windows.find_all("window")
for viewpoints in window.find_all("viewpoints")
for viewpoint in viewpoints.find_all("viewpoint")]
在你的情况下,两者都会给出:
Out[18]:
['Good Filter Design',
'Poor Filter Design',
'Heat Map',
'Lightbulb',
'Sales Histogram']
使用 elementtree 应该很容易。我不知道 getroot()
到底做了什么,但如果它确实是 XML 文档的 return 根元素,那么你不应该在 [=13] 中提到 window
=]参数:
>>> from xml.etree import ElementTree as ET
>>> raw = '''your XML string'''
>>> root = ET.fromstring(raw)
>>> for v in root.findall('window/viewpoints'):
... print([a.get('name') for a in v.findall('viewpoint')])
...
['Good Filter Design', 'Poor Filter Design']
['Heat Map', 'Lightbulb', 'Sales Histogram']