将带有命名空间的 XML 解析为数据框
Parsing XML with namespaces into a dataframe
我有以下简化的XML:
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Body>
<ReadResponse xmlns="ABCDEFG.com">
<ReadResult>
<Value>
<Alias>x1</Alias>
<Timestamp>2013-11-11T00:00:00</Timestamp>
<Val>113</Val>
<Duration>5000</Duration>
<Quality>128</Quality>
</Value>
<Value>
<Alias>x1</Alias>
<Timestamp>2014-11-11T00:02:00</Timestamp>
<Val>110</Val>
<Duration>5000</Duration>
<Quality>128</Quality>
</Value>
<Value>
<Alias>x2</Alias>
<Timestamp>2013-11-11T00:00:00</Timestamp>
<Val>101</Val>
<Duration>5000</Duration>
<Quality>128</Quality>
</Value>
<Value>
<Alias>x2</Alias>
<Timestamp>2014-11-11T00:02:00</Timestamp>
<Val>122</Val>
<Duration>5000</Duration>
<Quality>128</Quality>
</Value>
</ReadResult>
</ReadResponse>
</soap:Body>
</soap:Envelope>
并希望将其解析为具有以下结构的数据框(保留一些标签并丢弃其余标签):
Timestamp x1 x2
2013-11-11T00:00:00 113 101
2014-11-11T00:02:00 110 122
问题是因为 XML 文件包含名称空间,我不知道如何进行。我已经完成了几个教程(例如,https://docs.python.org/2/library/pyexpat.html) and questions (e.g., How to open this XML file to create dataframe in Python? and Parsing XML with namespace in Python via 'ElementTree'),但其中 none 有 helped/worked。如果有人能帮我解决这个问题,我将不胜感激。
这是一个关于如何使用 lxml 和 xpaths 解析 xml 的示例:
from lxml import etree
namespaces = {'abc': "ABCDEFG.com"}
xmltree = etree.fromstring(xml_string)
items = xmltree.xpath('//abc:Alias/text()', namespaces=namespaces)
print items
我有以下简化的XML:
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:soap="http://www.w3.org/2003/05/soap-envelope">
<soap:Body>
<ReadResponse xmlns="ABCDEFG.com">
<ReadResult>
<Value>
<Alias>x1</Alias>
<Timestamp>2013-11-11T00:00:00</Timestamp>
<Val>113</Val>
<Duration>5000</Duration>
<Quality>128</Quality>
</Value>
<Value>
<Alias>x1</Alias>
<Timestamp>2014-11-11T00:02:00</Timestamp>
<Val>110</Val>
<Duration>5000</Duration>
<Quality>128</Quality>
</Value>
<Value>
<Alias>x2</Alias>
<Timestamp>2013-11-11T00:00:00</Timestamp>
<Val>101</Val>
<Duration>5000</Duration>
<Quality>128</Quality>
</Value>
<Value>
<Alias>x2</Alias>
<Timestamp>2014-11-11T00:02:00</Timestamp>
<Val>122</Val>
<Duration>5000</Duration>
<Quality>128</Quality>
</Value>
</ReadResult>
</ReadResponse>
</soap:Body>
</soap:Envelope>
并希望将其解析为具有以下结构的数据框(保留一些标签并丢弃其余标签):
Timestamp x1 x2
2013-11-11T00:00:00 113 101
2014-11-11T00:02:00 110 122
问题是因为 XML 文件包含名称空间,我不知道如何进行。我已经完成了几个教程(例如,https://docs.python.org/2/library/pyexpat.html) and questions (e.g., How to open this XML file to create dataframe in Python? and Parsing XML with namespace in Python via 'ElementTree'),但其中 none 有 helped/worked。如果有人能帮我解决这个问题,我将不胜感激。
这是一个关于如何使用 lxml 和 xpaths 解析 xml 的示例:
from lxml import etree
namespaces = {'abc': "ABCDEFG.com"}
xmltree = etree.fromstring(xml_string)
items = xmltree.xpath('//abc:Alias/text()', namespaces=namespaces)
print items