XML 在 python 中解析(Elia 结构)
XML Parsing in python (Elia structure)
我想解析这个 xml 类型的文件:
<?xml version="1.0" encoding="utf-8"?>
<SolarForecastingChartDataForZone xmlns="http://schemas.datacontract.org/2004/07/Elia.PublicationService.DomainInterface.SolarForecasting.v3" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<ErrorMessage i:nil="true"/>
<IntervalInMinutes>15</IntervalInMinutes>
<SolarForecastingChartDataForZoneItems>
<SolarForecastingChartDataForZoneItem>
<DayAheadForecast>-50</DayAheadForecast>
<DayAheadP10>-50</DayAheadP10>
<DayAheadP90>-50</DayAheadP90>
<Forecast>0</Forecast>
<ForecastP10>0</ForecastP10>
<ForecastP90>0</ForecastP90>
<ForecastUpdated>0</ForecastUpdated>
<IntraDayP10>-50</IntraDayP10>
<IntraDayP90>-50</IntraDayP90>
<LoadFactor>0</LoadFactor>
<RealTime>0</RealTime>
<StartsOn xmlns:a="http://schemas.datacontract.org/2004/07/System">
<a:DateTime>2013-09-29T22:00:00Z</a:DateTime>
<a:OffsetMinutes>0</a:OffsetMinutes>
</StartsOn>
<WeekAheadForecast>-50</WeekAheadForecast>
<WeekAheadP10>-50</WeekAheadP10>
<WeekAheadP90>-50</WeekAheadP90>
</SolarForecastingChartDataForZoneItem>
<SolarForecastingChartDataForZoneItem>
<DayAheadForecast>-50</DayAheadForecast>
<DayAheadP10>-50</DayAheadP10>
<DayAheadP90>-50</DayAheadP90>
<Forecast>0</Forecast>
<ForecastP10>0</ForecastP10>
<ForecastP90>0</ForecastP90>
<ForecastUpdated>0</ForecastUpdated>
....
恢复水平 <Forecast>
和 <a:DateTime>
我试过beautiful soup和minidom,例如:
from xml.dom import minidom
xmldoc = minidom.parse('xmlfile')
itemlist = xmldoc.getElementsByTagName('Forecast')
print(len(itemlist)) #to get the number of savings
for s in xmldoc.getElementsByTagName('Forecast'):
print s.nodeValue
但我没有任何价值。
我想我错了,但我不明白为什么。
有人可以帮助我吗?
谢谢
不太确定你想要的输出是什么,但当我看到这个问题时我正在使用 LXML 和 XPATH。
from lxml import html
mystring = ''' I cut and pasted your string here '''
tree = html.fromstring(mystring)
>>> for forecast in tree.xpath('//forecast'):
forecast.text_content()
'0'
'0'
>>> for dtime in tree.xpath('//datetime'):
dtime.text_content()
'2013-09-29T22:00:00Z'
>>>
然后再乱来
all_elements = [e for e in tree.iter()]
for each_element in all_elements[1:]: # The first element is the root - it has all the text without the tags though so I don't want to look at this one
each_element.tag, each_element.text_content()
('errormessage', '')
('intervalinminutes', '15')
('solarforecastingchartdataforzoneitems', '\n \n -50\n -50\n -50\n 0\n 0\n 0\n 0\n -50\n -50\n 0\n 0\n \n 2013-09-29T22:00:00Z\n 0\n \n -50\n -50\n -50\n \n \n -50\n -50\n -50\n 0\n 0\n 0\n 0')
('solarforecastingchartdataforzoneitem', '\n -50\n -50\n -50\n 0\n 0\n 0\n 0\n -50\n -50\n 0\n 0\n \n 2013-09-29T22:00:00Z\n 0\n \n -50\n -50\n -50\n ')
('dayaheadforecast', '-50')
('dayaheadp10', '-50')
('dayaheadp90', '-50')
('forecast', '0')
('forecastp10', '0')
('forecastp90', '0')
('forecastupdated', '0')
('intradayp10', '-50')
.
.
.
我想解析这个 xml 类型的文件:
<?xml version="1.0" encoding="utf-8"?>
<SolarForecastingChartDataForZone xmlns="http://schemas.datacontract.org/2004/07/Elia.PublicationService.DomainInterface.SolarForecasting.v3" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<ErrorMessage i:nil="true"/>
<IntervalInMinutes>15</IntervalInMinutes>
<SolarForecastingChartDataForZoneItems>
<SolarForecastingChartDataForZoneItem>
<DayAheadForecast>-50</DayAheadForecast>
<DayAheadP10>-50</DayAheadP10>
<DayAheadP90>-50</DayAheadP90>
<Forecast>0</Forecast>
<ForecastP10>0</ForecastP10>
<ForecastP90>0</ForecastP90>
<ForecastUpdated>0</ForecastUpdated>
<IntraDayP10>-50</IntraDayP10>
<IntraDayP90>-50</IntraDayP90>
<LoadFactor>0</LoadFactor>
<RealTime>0</RealTime>
<StartsOn xmlns:a="http://schemas.datacontract.org/2004/07/System">
<a:DateTime>2013-09-29T22:00:00Z</a:DateTime>
<a:OffsetMinutes>0</a:OffsetMinutes>
</StartsOn>
<WeekAheadForecast>-50</WeekAheadForecast>
<WeekAheadP10>-50</WeekAheadP10>
<WeekAheadP90>-50</WeekAheadP90>
</SolarForecastingChartDataForZoneItem>
<SolarForecastingChartDataForZoneItem>
<DayAheadForecast>-50</DayAheadForecast>
<DayAheadP10>-50</DayAheadP10>
<DayAheadP90>-50</DayAheadP90>
<Forecast>0</Forecast>
<ForecastP10>0</ForecastP10>
<ForecastP90>0</ForecastP90>
<ForecastUpdated>0</ForecastUpdated>
....
恢复水平 <Forecast>
和 <a:DateTime>
我试过beautiful soup和minidom,例如:
from xml.dom import minidom
xmldoc = minidom.parse('xmlfile')
itemlist = xmldoc.getElementsByTagName('Forecast')
print(len(itemlist)) #to get the number of savings
for s in xmldoc.getElementsByTagName('Forecast'):
print s.nodeValue
但我没有任何价值。 我想我错了,但我不明白为什么。 有人可以帮助我吗? 谢谢
不太确定你想要的输出是什么,但当我看到这个问题时我正在使用 LXML 和 XPATH。
from lxml import html
mystring = ''' I cut and pasted your string here '''
tree = html.fromstring(mystring)
>>> for forecast in tree.xpath('//forecast'):
forecast.text_content()
'0'
'0'
>>> for dtime in tree.xpath('//datetime'):
dtime.text_content()
'2013-09-29T22:00:00Z'
>>>
然后再乱来
all_elements = [e for e in tree.iter()]
for each_element in all_elements[1:]: # The first element is the root - it has all the text without the tags though so I don't want to look at this one
each_element.tag, each_element.text_content()
('errormessage', '')
('intervalinminutes', '15')
('solarforecastingchartdataforzoneitems', '\n \n -50\n -50\n -50\n 0\n 0\n 0\n 0\n -50\n -50\n 0\n 0\n \n 2013-09-29T22:00:00Z\n 0\n \n -50\n -50\n -50\n \n \n -50\n -50\n -50\n 0\n 0\n 0\n 0')
('solarforecastingchartdataforzoneitem', '\n -50\n -50\n -50\n 0\n 0\n 0\n 0\n -50\n -50\n 0\n 0\n \n 2013-09-29T22:00:00Z\n 0\n \n -50\n -50\n -50\n ')
('dayaheadforecast', '-50')
('dayaheadp10', '-50')
('dayaheadp90', '-50')
('forecast', '0')
('forecastp10', '0')
('forecastp90', '0')
('forecastupdated', '0')
('intradayp10', '-50')
.
.
.