Python 从 OrderedDicts 列表中提取值
Python extract values from list of OrderedDicts
我已经用 xmltodict 解析了一个 XML 文件,我发现了 <coordinates>
标签的路径,我希望从中提取纬度和经度值以添加到数据框。这是一个小样本:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<Folder>
<name>One Line Diagram</name>
<open>0</open>
<Folder>
<name>SectionOne</name>
<open>0</open>
<Folder>
<name>Node</name>
<open>0</open>
<Placemark>
<name>5680420</name>
<styleUrl>#Style_0</styleUrl>
<description />
<MultiGeometry type="MultiGeometry" Type="MultiGeometry">
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-83.6514766,67.0234192 -83.6515403,67.0233918 -83.6515309,67.0233134 -83.6514609,67.0232885 -83.5778406,67.0246267 -83.5777768,67.0246541 -83.5777861,67.0247325 -83.5778560,67.0247574 -83.6514766,67.0234192</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</MultiGeometry>
</Placemark>
<Placemark>
<name>25934531</name>
<styleUrl>#Style_0</styleUrl>
ML60
<description />
<MultiGeometry type="MultiGeometry" Type="MultiGeometry">
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-83.6512679,67.0216805 -83.6513317,67.0216531 -83.6513222,67.0215747 -83.6512522,67.0215498 -83.5967049,67.0225434 -83.5966412,67.0225708 -83.5966505,67.0226492 -83.5967204,67.0226741 -83.6512679,67.0216805</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</MultiGeometry>
</Placemark>
</Folder>
</Folder>
</Folder>
</Document>
</kml>
路径如下。
> doc['kml']['Document']['Folder']['Folder']['Folder'][0]['Placemark'][0]['MultiGeometry']['Polygon']['outerBoundaryIs']['LinearRing']['coordinates']
这是一个非常长的 xml 文档,有 4 个 Folder
标签,但我只需要第一个 ['Folder'][0]
的值。我不知道该怎么做是遍历所有 ['Placemark'][n]
直到提取所有坐标。
我已经尝试了几种方法,最后一种在下面,它是尝试开始工作以找到正确的标签。但无济于事。
root_elements = doc['Document'] if type(doc['Document']) == OrderedDict else [doc['Document']]
for element in root_elements:
print(element['Placemark'])
回溯:
Traceback (most recent call last)
<ipython-input-69-db580dc8b6e2> in <module>()
----> 1 root_elements = doc['Document'] if type(doc['Document']) == OrderedDict else [doc['Document']]
2 for element in root_elements:
3 print(element['Placemark'])
KeyError: 'Document'
感谢任何帮助。
您的 xml 缺少 2 个文件夹的结束标记(下面的倒数第 4 行和倒数第 3 行。只需将它们复制并粘贴到您的 XML 文件中)。
使用此工具缩进 XML https://www.freeformatter.com/xml-formatter.html#ad-output
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<Folder>
<name>One Line Diagram</name>
<open>0</open>
<Folder>
<name>SectionOne</name>
<open>0</open>
<Folder>
<name>Node</name>
<open>0</open>
<Placemark>
<name>5680420</name>
<styleUrl>#Style_0</styleUrl>
<description />
<MultiGeometry type="MultiGeometry" Type="MultiGeometry">
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-83.6514766,67.0234192 -83.6515403,67.0233918 -83.6515309,67.0233134 -83.6514609,67.0232885 -83.5778406,67.0246267 -83.5777768,67.0246541 -83.5777861,67.0247325 -83.5778560,67.0247574 -83.6514766,67.0234192</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</MultiGeometry>
</Placemark>
<Placemark>
<name>25934531</name>
<styleUrl>#Style_0</styleUrl>
ML60
<description />
<MultiGeometry type="MultiGeometry" Type="MultiGeometry">
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-83.6512679,67.0216805 -83.6513317,67.0216531 -83.6513222,67.0215747 -83.6512522,67.0215498 -83.5967049,67.0225434 -83.5966412,67.0225708 -83.5966505,67.0226492 -83.5967204,67.0226741 -83.6512679,67.0216805</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</MultiGeometry>
</Placemark>
</Folder>
</Folder>
</Folder>
</Document>
</kml>
使用 xmltodict 从包含您的 XML 的 coordinates.xml 文件中提取坐标(包含 2 个缺少的文件夹关闭标记)
import xmltodict
with open('coordinates.xml') as coords:
doc = xmltodict.parse(coords.read())
coordinates = []
#Loop and get each placemark tag in document
for placemark in doc['kml']['Document']['Folder']['Folder']['Folder']['Placemark']:
#Get coordinates string from current placemark
coordinateString=placemark['MultiGeometry']['Polygon']['outerBoundaryIs']['LinearRing']['coordinates']
#split coordinates string into lists of coordinates. Split co-ord pairs by space (" "). Split x & y of each co-ord by comma (",")
coordinateList=[x.split(",") for x in coordinateString.split(" ")]
coordinates.append(coordinateList)
print(coordinates)
打印输出"coordinates"列表
[[[u'-83.6514766', u'67.0234192'], [u'-83.6515403', u'67.0233918'], [u'-83.6515309', u'67.0233134'], [u'-83.6514609', u'67.0232885'], [u'-83.5778406', u'67.0246267'], [u'-83.5777768', u'67.0246541'], [u'-83.5777861', u'67.0247325'], [u'-83.5778560', u'67.0247574'], [u'-83.6514766', u'67.0234192']], [[u'-83.6512679', u'67.0216805'], [u'-83.6513317', u'67.0216531'], [u'-83.6513222', u'67.0215747'], [u'-83.6512522', u'67.0215498'], [u'-83.5967049', u'67.0225434'], [u'-83.5966412', u'67.0225708'], [u'-83.5966505', u'67.0226492'], [u'-83.5967204', u'67.0226741'], [u'-83.6512679', u'67.0216805']]]
坐标[0] 给出第一个地标标签的坐标列表
[[u'-83.6514766', u'67.0234192'], [u'-83.6515403', u'67.0233918'], [u'-83.6515309', u'67.0233134'], [u'-83.6514609', u'67.0232885'], [u'-83.5778406', u'67.0246267'], [u'-83.5777768', u'67.0246541'], [u'-83.5777861', u'67.0247325'], [u'-83.5778560', u'67.0247574'], [u'-83.6514766', u'67.0234192']], [[u'-83.6512679', u'67.0216805']
坐标[0][0] 给出第一个地标标签的第一个坐标对
[u'-83.6514766', u'67.0234192']
coordinates[0][0] 给出第一个地标标签的第一个坐标对的 x 值
-83.6514766
我已经用 xmltodict 解析了一个 XML 文件,我发现了 <coordinates>
标签的路径,我希望从中提取纬度和经度值以添加到数据框。这是一个小样本:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<Folder>
<name>One Line Diagram</name>
<open>0</open>
<Folder>
<name>SectionOne</name>
<open>0</open>
<Folder>
<name>Node</name>
<open>0</open>
<Placemark>
<name>5680420</name>
<styleUrl>#Style_0</styleUrl>
<description />
<MultiGeometry type="MultiGeometry" Type="MultiGeometry">
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-83.6514766,67.0234192 -83.6515403,67.0233918 -83.6515309,67.0233134 -83.6514609,67.0232885 -83.5778406,67.0246267 -83.5777768,67.0246541 -83.5777861,67.0247325 -83.5778560,67.0247574 -83.6514766,67.0234192</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</MultiGeometry>
</Placemark>
<Placemark>
<name>25934531</name>
<styleUrl>#Style_0</styleUrl>
ML60
<description />
<MultiGeometry type="MultiGeometry" Type="MultiGeometry">
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-83.6512679,67.0216805 -83.6513317,67.0216531 -83.6513222,67.0215747 -83.6512522,67.0215498 -83.5967049,67.0225434 -83.5966412,67.0225708 -83.5966505,67.0226492 -83.5967204,67.0226741 -83.6512679,67.0216805</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</MultiGeometry>
</Placemark>
</Folder>
</Folder>
</Folder>
</Document>
</kml>
路径如下。
> doc['kml']['Document']['Folder']['Folder']['Folder'][0]['Placemark'][0]['MultiGeometry']['Polygon']['outerBoundaryIs']['LinearRing']['coordinates']
这是一个非常长的 xml 文档,有 4 个 Folder
标签,但我只需要第一个 ['Folder'][0]
的值。我不知道该怎么做是遍历所有 ['Placemark'][n]
直到提取所有坐标。
我已经尝试了几种方法,最后一种在下面,它是尝试开始工作以找到正确的标签。但无济于事。
root_elements = doc['Document'] if type(doc['Document']) == OrderedDict else [doc['Document']]
for element in root_elements:
print(element['Placemark'])
回溯:
Traceback (most recent call last)
<ipython-input-69-db580dc8b6e2> in <module>()
----> 1 root_elements = doc['Document'] if type(doc['Document']) == OrderedDict else [doc['Document']]
2 for element in root_elements:
3 print(element['Placemark'])
KeyError: 'Document'
感谢任何帮助。
您的 xml 缺少 2 个文件夹的结束标记(下面的倒数第 4 行和倒数第 3 行。只需将它们复制并粘贴到您的 XML 文件中)。
使用此工具缩进 XML https://www.freeformatter.com/xml-formatter.html#ad-output
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document>
<Folder>
<name>One Line Diagram</name>
<open>0</open>
<Folder>
<name>SectionOne</name>
<open>0</open>
<Folder>
<name>Node</name>
<open>0</open>
<Placemark>
<name>5680420</name>
<styleUrl>#Style_0</styleUrl>
<description />
<MultiGeometry type="MultiGeometry" Type="MultiGeometry">
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-83.6514766,67.0234192 -83.6515403,67.0233918 -83.6515309,67.0233134 -83.6514609,67.0232885 -83.5778406,67.0246267 -83.5777768,67.0246541 -83.5777861,67.0247325 -83.5778560,67.0247574 -83.6514766,67.0234192</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</MultiGeometry>
</Placemark>
<Placemark>
<name>25934531</name>
<styleUrl>#Style_0</styleUrl>
ML60
<description />
<MultiGeometry type="MultiGeometry" Type="MultiGeometry">
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-83.6512679,67.0216805 -83.6513317,67.0216531 -83.6513222,67.0215747 -83.6512522,67.0215498 -83.5967049,67.0225434 -83.5966412,67.0225708 -83.5966505,67.0226492 -83.5967204,67.0226741 -83.6512679,67.0216805</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</MultiGeometry>
</Placemark>
</Folder>
</Folder>
</Folder>
</Document>
</kml>
使用 xmltodict 从包含您的 XML 的 coordinates.xml 文件中提取坐标(包含 2 个缺少的文件夹关闭标记)
import xmltodict
with open('coordinates.xml') as coords:
doc = xmltodict.parse(coords.read())
coordinates = []
#Loop and get each placemark tag in document
for placemark in doc['kml']['Document']['Folder']['Folder']['Folder']['Placemark']:
#Get coordinates string from current placemark
coordinateString=placemark['MultiGeometry']['Polygon']['outerBoundaryIs']['LinearRing']['coordinates']
#split coordinates string into lists of coordinates. Split co-ord pairs by space (" "). Split x & y of each co-ord by comma (",")
coordinateList=[x.split(",") for x in coordinateString.split(" ")]
coordinates.append(coordinateList)
print(coordinates)
打印输出"coordinates"列表
[[[u'-83.6514766', u'67.0234192'], [u'-83.6515403', u'67.0233918'], [u'-83.6515309', u'67.0233134'], [u'-83.6514609', u'67.0232885'], [u'-83.5778406', u'67.0246267'], [u'-83.5777768', u'67.0246541'], [u'-83.5777861', u'67.0247325'], [u'-83.5778560', u'67.0247574'], [u'-83.6514766', u'67.0234192']], [[u'-83.6512679', u'67.0216805'], [u'-83.6513317', u'67.0216531'], [u'-83.6513222', u'67.0215747'], [u'-83.6512522', u'67.0215498'], [u'-83.5967049', u'67.0225434'], [u'-83.5966412', u'67.0225708'], [u'-83.5966505', u'67.0226492'], [u'-83.5967204', u'67.0226741'], [u'-83.6512679', u'67.0216805']]]
坐标[0] 给出第一个地标标签的坐标列表
[[u'-83.6514766', u'67.0234192'], [u'-83.6515403', u'67.0233918'], [u'-83.6515309', u'67.0233134'], [u'-83.6514609', u'67.0232885'], [u'-83.5778406', u'67.0246267'], [u'-83.5777768', u'67.0246541'], [u'-83.5777861', u'67.0247325'], [u'-83.5778560', u'67.0247574'], [u'-83.6514766', u'67.0234192']], [[u'-83.6512679', u'67.0216805']
坐标[0][0] 给出第一个地标标签的第一个坐标对
[u'-83.6514766', u'67.0234192']
coordinates[0][0] 给出第一个地标标签的第一个坐标对的 x 值
-83.6514766