使用 Python 解析 KML 文件并存储在数据库中
Parsing a KML File and storing in a database with Python
我有 4 个包含多个多边形的 KML 文件。我想解析 KML 文件,提取数据,然后将其存储到我的数据库中。经过研究,我认为解析KML文件的最佳方法是安装pyKML。
我的一个 KML 文件如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<name>RecAreaPolygons.TAB</name>
<Schema name="RecAreaPolygons" id="S_RecAreaPolygons_SSSS">
<SimpleField type="string" name="RecAreaName"><displayName><b>RecAreaName</b></displayName>
</SimpleField>
<SimpleField type="string" name="RecAreaCategory"><displayName><b>RecAreaCategory</b></displayName>
</SimpleField>
<SimpleField type="string" name="Province"><displayName><b>Province</b></displayName>
</SimpleField>
<SimpleField type="string" name="Comments"><displayName><b>Comments</b></displayName>
</SimpleField>
</Schema>
<Style id="style1">
<BalloonStyle>
<text><![CDATA[<table border="0">
<tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
<tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
<tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
<tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
</BalloonStyle>
<PolyStyle>
<color>ff00ff00</color>
</PolyStyle>
</Style>
<Style id="falseColor">
<BalloonStyle>
<text><![CDATA[<table border="0">
<tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
<tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
<tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
<tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
</BalloonStyle>
<PolyStyle>
<colorMode>random</colorMode>
</PolyStyle>
</Style>
<Folder id="layer 0">
<name>RecAreaPolygons</name>
<Placemark>
<name>Whistler</name>
<styleUrl>#falseColor</styleUrl>
<Style id="inline">
<IconStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</IconStyle>
<LineStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</LineStyle>
<PolyStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</PolyStyle>
</Style>
<ExtendedData>
<SchemaData schemaUrl="#S_RecAreaPolygons_SSSS">
<SimpleData name="RecAreaName">Whistler</SimpleData>
<SimpleData name="RecAreaCategory">World Class</SimpleData>
<SimpleData name="Province">BC</SimpleData>
<SimpleData name="Comments"></SimpleData>
</SchemaData>
</ExtendedData>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
//MULTIPLE OTHER PLACEMARKS
正如我提到的,我的尝试是安装 pyKML,安装后,我 运行 使用以下代码将其存储到数据框中:
with open('RecAreaPolygons.kml', 'rb') as f:
s = f.read()
root = parser.fromstring(s)
print(root.Document.Folder.Placemark.Polygon.outerBoundaryIs.LinearRing.coordinates)
我能够打印第一个地标的坐标,但如何接收其余坐标并将其迭代添加到数据框?
我希望我的输出看起来像:
RecAreaName RecAreaCategory Province Comments Coordinates
0 Whistler World Class BC -123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0
1 The rest of the entries
2
您可以遍历地标,将名称和几何图形添加到列表中。然后从列表中创建一个数据框。
如果 KML 有多个文件夹,那么您将需要遍历文件夹,然后遍历文件夹中的地标。
from pykml import parser
import pandas as pd
with open('RecAreaPolygons.kml', 'r', encoding="utf-8") as f:
root = parser.parse(f).getroot()
places = []
for place in root.Document.Folder.Placemark:
coords = place.Polygon.outerBoundaryIs.LinearRing.coordinates.text.strip()
data = {item.get("name"): item.text for item in
place.ExtendedData.SchemaData.SimpleData}
places.append({"RecAreaName ": data.get('RecAreaName'),
"RecAreaCategory": data.get('RecAreaCategory'),
"Province": data.get('Province'),
"Comments": data.get('Comments'),
"Coordinates": coords})
df = pd.DataFrame(places)
print(df)
输出:
RecAreaName RecAreaCategory Province Comments Coordinates
0 Whistler World Class BC None -123.052382,50.094969,0, -123.050613,50.07531...
如果希望坐标是一个列表,则在 strip() 调用后的循环中将 .split(' ')
添加到 coords 变量的赋值中。
我有 4 个包含多个多边形的 KML 文件。我想解析 KML 文件,提取数据,然后将其存储到我的数据库中。经过研究,我认为解析KML文件的最佳方法是安装pyKML。
我的一个 KML 文件如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<name>RecAreaPolygons.TAB</name>
<Schema name="RecAreaPolygons" id="S_RecAreaPolygons_SSSS">
<SimpleField type="string" name="RecAreaName"><displayName><b>RecAreaName</b></displayName>
</SimpleField>
<SimpleField type="string" name="RecAreaCategory"><displayName><b>RecAreaCategory</b></displayName>
</SimpleField>
<SimpleField type="string" name="Province"><displayName><b>Province</b></displayName>
</SimpleField>
<SimpleField type="string" name="Comments"><displayName><b>Comments</b></displayName>
</SimpleField>
</Schema>
<Style id="style1">
<BalloonStyle>
<text><![CDATA[<table border="0">
<tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
<tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
<tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
<tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
</BalloonStyle>
<PolyStyle>
<color>ff00ff00</color>
</PolyStyle>
</Style>
<Style id="falseColor">
<BalloonStyle>
<text><![CDATA[<table border="0">
<tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
<tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
<tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
<tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
</BalloonStyle>
<PolyStyle>
<colorMode>random</colorMode>
</PolyStyle>
</Style>
<Folder id="layer 0">
<name>RecAreaPolygons</name>
<Placemark>
<name>Whistler</name>
<styleUrl>#falseColor</styleUrl>
<Style id="inline">
<IconStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</IconStyle>
<LineStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</LineStyle>
<PolyStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</PolyStyle>
</Style>
<ExtendedData>
<SchemaData schemaUrl="#S_RecAreaPolygons_SSSS">
<SimpleData name="RecAreaName">Whistler</SimpleData>
<SimpleData name="RecAreaCategory">World Class</SimpleData>
<SimpleData name="Province">BC</SimpleData>
<SimpleData name="Comments"></SimpleData>
</SchemaData>
</ExtendedData>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
//MULTIPLE OTHER PLACEMARKS
正如我提到的,我的尝试是安装 pyKML,安装后,我 运行 使用以下代码将其存储到数据框中:
with open('RecAreaPolygons.kml', 'rb') as f:
s = f.read()
root = parser.fromstring(s)
print(root.Document.Folder.Placemark.Polygon.outerBoundaryIs.LinearRing.coordinates)
我能够打印第一个地标的坐标,但如何接收其余坐标并将其迭代添加到数据框?
我希望我的输出看起来像:
RecAreaName RecAreaCategory Province Comments Coordinates
0 Whistler World Class BC -123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0
1 The rest of the entries
2
您可以遍历地标,将名称和几何图形添加到列表中。然后从列表中创建一个数据框。
如果 KML 有多个文件夹,那么您将需要遍历文件夹,然后遍历文件夹中的地标。
from pykml import parser
import pandas as pd
with open('RecAreaPolygons.kml', 'r', encoding="utf-8") as f:
root = parser.parse(f).getroot()
places = []
for place in root.Document.Folder.Placemark:
coords = place.Polygon.outerBoundaryIs.LinearRing.coordinates.text.strip()
data = {item.get("name"): item.text for item in
place.ExtendedData.SchemaData.SimpleData}
places.append({"RecAreaName ": data.get('RecAreaName'),
"RecAreaCategory": data.get('RecAreaCategory'),
"Province": data.get('Province'),
"Comments": data.get('Comments'),
"Coordinates": coords})
df = pd.DataFrame(places)
print(df)
输出:
RecAreaName RecAreaCategory Province Comments Coordinates
0 Whistler World Class BC None -123.052382,50.094969,0, -123.050613,50.07531...
如果希望坐标是一个列表,则在 strip() 调用后的循环中将 .split(' ')
添加到 coords 变量的赋值中。