使用pykml解析kml时出现Lxml错误

Question

我正在尝试使用 pykml 解析包含多个地标的 kml 文件。我想编辑 kml 描述中的 HTML 代码，主要用于 Google Earth 中地理数据的可视化。我研究了很多方法：

Extract Coordinates from KML BatchGeo File with Python
Read kml file with multiple placemarks in pykml
Using pyKML to parse KML Document
KML to string in Python?

但是，我总是收到如下所示的 lxml 错误。 :(

    Traceback (most recent call last):
    File "C:\Users\Arellano\Copy\BSGE15-2016 SUMMER\trial7.py", line 5, in <module>
    root = parser.fromstring(open('trim_KML.kml', 'r').read())
  File "C:\Program Files (x86)\Python2.7.10\lib\site-packages\pykml-0.1.0-py2.7.egg\pykml\parser.py", line 41, in fromstring
    return objectify.fromstring(text)
  File "src/lxml/lxml.objectify.pyx", line 1801, in lxml.objectify.fromstring (src\lxml\lxml.objectify.c:25171)
  File "src/lxml/lxml.etree.pyx", line 3213, in lxml.etree.fromstring (src\lxml\lxml.etree.c:77697)
  File "src/lxml/parser.pxi", line 1819, in lxml.etree._parseMemoryDocument (src\lxml\lxml.etree.c:116494)
  File "src/lxml/parser.pxi", line 1707, in lxml.etree._parseDoc (src\lxml\lxml.etree.c:115144)
  File "src/lxml/parser.pxi", line 1079, in lxml.etree._BaseParser._parseDoc (src\lxml\lxml.etree.c:109543)
  File "src/lxml/parser.pxi", line 573, in lxml.etree._ParserContext._handleParseResultDoc (src\lxml\lxml.etree.c:103404)
  File "src/lxml/parser.pxi", line 683, in lxml.etree._handleParseResult (src\lxml\lxml.etree.c:105058)
  File "src/lxml/parser.pxi", line 613, in lxml.etree._raiseParseError (src\lxml\lxml.etree.c:103967)
XMLSyntaxError: Namespace prefix xsi for schemaLocation on Document is not defined, line 3, column 32

这是我的代码片段：（它应该基于我的一个来源工作）

from pykml import parser

root = parser.fromstring(open('trim_KML.kml', 'r').read())
print etree.tostring(root.Document.Placemark.LineString.Description)

我已经安装了 pykml 和 lxml 3.6.0，我目前正在使用我的 Python 2.7.10。 kml 文件包含行。 (kml link: https://sites.google.com/site/kmlhostingmwss/trim.kml) 我的 ArcGIS 10.2 也有 Python 2.7。

我是处理 kml 文件的新手。有人可以告诉我我做错了什么吗？或者有没有更简单的方法来编辑 kml 文件的描述？非常感谢你。 :)))

Answer 1

xml有一些问题，如果要消除错误，在第二行添加xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"：

<kml  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">

然后使用 lxml，以下工作：

import lxml.etree as et

xml = et.parse("trim.kml").getroot()

print(xml.xpath("//kml:Document//kml:Placemark/kml:description", namespaces={"kml":xml.nsmap["kml"]}))

这给你：

[<Element {http://www.opengis.net/kml/2.2}description at 0x7f612d0885f0>, <Element {http://www.opengis.net/kml/2.2}description at 0x7f612d088cb0>, <Element {http://www.opengis.net/kml/2.2}description at 0x7f612d088d40>, <Element {http://www.opengis.net/kml/2.2}description at 0x7f612d088d88>, <Element {http://www.opengis.net/kml/2.2}description at 0x7f612d088dd0>, <Element {http://www.opengis.net/kml/2.2}description at 0x7f612d088e18>]

您也可以使用 lxml.html，它会更好地处理损坏的 xml，数据本身也是 99% html。

您可以通过以下方式从 document.placemark 中获取一个：

from lxml import html
xml = html.parse("trim.kml")
print(xml.xpath("//placemark/description"))

这给你：

[<Element description at 0x7f1c757fad08>, <Element description at 0x7f1c757fad60>, <Element description at 0x7f1c757fadb8>, <Element description at 0x7f1c757fae10>, <Element description at 0x7f1c757fae68>, <Element description at 0x7f1c757faec0>]

使用pykml解析kml时出现Lxml错误

Lxml error when parsing kml using pykml

python

parsing

lxml

kml

pykml