使用 Python 解析 KML 文件并存储在数据库中

Parsing a KML File and storing in a database with Python

我有 4 个包含多个多边形的 KML 文件。我想解析 KML 文件,提取数据,然后将其存储到我的数据库中。经过研究,我认为解析KML文件的最佳方法是安装pyKML。

我的一个 KML 文件如下所示:

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
    <name>RecAreaPolygons.TAB</name>
    <Schema name="RecAreaPolygons" id="S_RecAreaPolygons_SSSS">
        <SimpleField type="string" name="RecAreaName"><displayName>&lt;b&gt;RecAreaName&lt;/b&gt;</displayName>
</SimpleField>
        <SimpleField type="string" name="RecAreaCategory"><displayName>&lt;b&gt;RecAreaCategory&lt;/b&gt;</displayName>
</SimpleField>
        <SimpleField type="string" name="Province"><displayName>&lt;b&gt;Province&lt;/b&gt;</displayName>
</SimpleField>
        <SimpleField type="string" name="Comments"><displayName>&lt;b&gt;Comments&lt;/b&gt;</displayName>
</SimpleField>
    </Schema>
    <Style id="style1">
        <BalloonStyle>
            <text><![CDATA[<table border="0">
  <tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
  <tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
  <tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
  <tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
        </BalloonStyle>
        <PolyStyle>
            <color>ff00ff00</color>
        </PolyStyle>
    </Style>
    <Style id="falseColor">
        <BalloonStyle>
            <text><![CDATA[<table border="0">
  <tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
  <tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
  <tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
  <tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
        </BalloonStyle>
        <PolyStyle>
            <colorMode>random</colorMode>
        </PolyStyle>
    </Style>
    <Folder id="layer 0">
        <name>RecAreaPolygons</name>
        <Placemark>
            <name>Whistler</name>
            <styleUrl>#falseColor</styleUrl>
            <Style id="inline">
                <IconStyle>
                    <color>ff0000ff</color>
                    <colorMode>normal</colorMode>
                </IconStyle>
                <LineStyle>
                    <color>ff0000ff</color>
                    <colorMode>normal</colorMode>
                </LineStyle>
                <PolyStyle>
                    <color>ff0000ff</color>
                    <colorMode>normal</colorMode>
                </PolyStyle>
            </Style>
            <ExtendedData>
                <SchemaData schemaUrl="#S_RecAreaPolygons_SSSS">
                    <SimpleData name="RecAreaName">Whistler</SimpleData>
                    <SimpleData name="RecAreaCategory">World Class</SimpleData>
                    <SimpleData name="Province">BC</SimpleData>
                    <SimpleData name="Comments"></SimpleData>
                </SchemaData>
            </ExtendedData>
            <Polygon>
                <outerBoundaryIs>
                    <LinearRing>
                        <coordinates>
                            -123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0 
                        </coordinates>
                    </LinearRing>
                </outerBoundaryIs>
            </Polygon>
        </Placemark>
//MULTIPLE OTHER PLACEMARKS


正如我提到的,我的尝试是安装 pyKML,安装后,我 运行 使用以下代码将其存储到数据框中:

with open('RecAreaPolygons.kml', 'rb') as f:
   s = f.read()
   
root = parser.fromstring(s)
print(root.Document.Folder.Placemark.Polygon.outerBoundaryIs.LinearRing.coordinates)

我能够打印第一个地标的坐标,但如何接收其余坐标并将其迭代添加到数据框?


我希望我的输出看起来像:

          RecAreaName  RecAreaCategory  Province  Comments  Coordinates  
0            Whistler      World Class        BC            -123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0 
1                       The rest of the entries
2            

您可以遍历地标,将名称和几何图形添加到列表中。然后从列表中创建一个数据框。

如果 KML 有多个文件夹,那么您将需要遍历文件夹,然后遍历文件夹中的地标。

from pykml import parser
import pandas as pd

with open('RecAreaPolygons.kml', 'r', encoding="utf-8") as f:
   root = parser.parse(f).getroot()
   
places = []
for place in root.Document.Folder.Placemark:
    coords = place.Polygon.outerBoundaryIs.LinearRing.coordinates.text.strip()
    data = {item.get("name"): item.text for item in
            place.ExtendedData.SchemaData.SimpleData}
    places.append({"RecAreaName  ": data.get('RecAreaName'),
                  "RecAreaCategory": data.get('RecAreaCategory'),
                  "Province": data.get('Province'),
                  "Comments": data.get('Comments'),
                  "Coordinates": coords})
df = pd.DataFrame(places)
print(df)

输出:

  RecAreaName   RecAreaCategory Province Comments  Coordinates
0      Whistler     World Class       BC     None  -123.052382,50.094969,0, -123.050613,50.07531...

如果希望坐标是一个列表,则在 strip() 调用后的循环中将 .split(' ') 添加到 coords 变量的赋值中。