使用 lxml 编辑 KML <description> 的 html 内容

Editing the html content of <description> of a KML using lxml

我想用新的格式化 html 替换 KML 描述标签内的 html。

我的 kml 具有以下结构:

<html>
 <body>
  <kml xmlns="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2">
   <document id="WATER_MAINLINE_trim" xsi:schemalocation="http://www.opengis.net/kml/2.2 http://schemas.opengis.net/kml/2.2.0/ogckml22.xsd http://www.google.com/kml/ext/2.2 http://code.google.com/apis/kml/schema/kml22gx.xsd">
    <name>
     WATER_MAINLINE_trim
    </name>
    <open>
     1
    </open>
    <snippet maxlines="0">
    </snippet>
    <style id="LineStyle00">
     <LabelStyle>
            <color>00000000</color>
            <scale>0</scale>
        </LabelStyle>
        <LineStyle>
            <color>ff240087</color>
        </LineStyle>
        <PolyStyle>
            <color>00000000</color>
            <outline>0</outline>
        </PolyStyle>
    </style>
    <folder id="FeatureLayer0">
     <name>
      WATER_MAINLINE_trim
     </name>
     <open>
      1
     </open>
     <snippet maxlines="0">
     </snippet>
     <placemark id="ID_00000">
      <name>
       0100026491
      </name>
      <snippet maxlines="0">
      </snippet>
      <description>
       <meta content="text/html" http-equiv="Content-Type" />
       <meta content="text/html; charset=utf-8" http-equiv="content-type" />
       <table style="font-family:Arial,Verdana,Times;font-size:12px;text-align:left;width:100%;border-collapse:collapse;padding:3px 3px 3px 3px">
        <tr style="text-align:center;font-weight:bold;background:#9CBCE2">
         <td>
          0100026491
         </td>
        </tr>
        <tr>
         <td>
          <table style="font-family:Arial,Verdana,Times;font-size:12px;text-align:left;width:100%;border-spacing:0px; padding:3px 3px 3px 3px">
           <tr>
            <td>
             FID
            </td>
            <td>
             0
            </td>
           </tr>
           <tr bgcolor="#D4E4F3">
            <td>
             PRIKEY
            </td>
            <td>
             0100026491
            </td>
           </tr>
           <tr>
            <td>
             YEAR_INST
            </td>
            <td>
             2001
            </td>
           </tr>
           <tr bgcolor="#D4E4F3">
            <td>
             PIPE_CLASS
            </td>
            <td>
             PRIMARY
            </td>
           </tr>
           <tr>
            <td>
             DIAMETER
            </td>
            <td>
             1500
            </td>
           </tr>
           <tr bgcolor="#D4E4F3">
            <td>
             MATERIAL
            </td>
            <td>
             SP
            </td>
           </tr>
           <tr>
            <td>
             STATUS
            </td>
            <td>
             ACTIVE
            </td>
           </tr>
           <tr bgcolor="#D4E4F3">
            <td>
             BA
            </td>
            <td>
             FCOM
            </td>
           </tr>
           <tr>
            <td>
             SUBCLASS
            </td>
            <td>
             WATER MAINLINE
            </td>
           </tr>
          </table>
         </td>
        </tr>
       </table>
      </description>
     </placemark>
    </folder>
   </document>
  </kml>
 </body>
</html>

我有这个新的 html:

newhtml="""<![CDATA[ \n<!------------TITLE SUBCLASS---------------->\n  <tr>\n    <td colspan="2" align="center">\n      <b><font color=\'#090259\' size=\'6\' style = \'bold\'>LA MESA BALARA</font><b>\n    </td>/n  </tr>\n<!------------IMAGE---------------->\n  <tr>\n    <td colspan="2" align="center">\n      <img src= http://static.rappler.com/images/640-lamesadam-20120728.jpg, width=500, height = 223, alt="picture" />\n    </td>\n  </tr>\n<!------------PRIKEY---------------->\n  <tr>\n    <td bgcolor = \'#090259\', align="center" >\n      <p><font color = \'FFFFFF\', size =\'4\'>PRIKEY</p>\n    </td>\n \n    <td bgcolor = \'#d8d8ff\' align="center">\n      <p>0100026491</p>\n    </td>\n<!------------YEAR INSTALLED---------------->\n  <tr>\n    <td bgcolor = \'#090259\', align="center" >\n      <p><font color = \'FFFFFF\', size =\'4\'>Year Installed</p>\n    </td>\n \n    <td bgcolor = \'#d8d8ff\' align="center">\n      <p>2001</p>\n    </td>\n<!------------PIPE CLASS---------------->\n  <tr>\n    <td bgcolor = \'#090259\', align="center" >\n      <p><font color = \'FFFFFF\', size =\'4\'>Pipe Class</p>\n    </td>\n \n    <td bgcolor = \'#d8d8ff\' align="center">\n      <p>PRIMARY</p>\n    </td>\n<!------------DIAMETER---------------->\n  <tr>\n    <td bgcolor = \'#090259\', align="center" >\n      <p><font color = \'FFFFFF\', size =\'4\'>Diameter (mm)</p>\n    </td>\n \n    <td bgcolor = \'#d8d8ff\' align="center">\n      <p>1500.000000</p>\n    </td>\n<!------------MATERIAL---------------->\n  <tr>\n    <td bgcolor = \'#090259\', align="center" >\n      <p><font color = \'FFFFFF\', size =\'4\'>Material</p>\n    </td>\n \n    <td bgcolor = \'#d8d8ff\' align="center">\n      <p>SP</p>\n    </td>\n<!------------STATUS---------------->\n  <tr>\n    <td bgcolor = \'#090259\', align="center" >\n      <p><font color = \'FFFFFF\', size =\'4\'>Status</p>\n    </td>\n \n    <td bgcolor = \'#d8d8ff\' align="center">\n      <p>ACTIVE</p>\n    </td>\n<!------------BUSINESS ADDRESS---------------->\n  <tr>\n    <td bgcolor = \'#090259\', align="center" >\n      <p><font color = \'FFFFFF\', size =\'4\'>Business Address</p>\n    </td>\n \n    <td bgcolor = \'#d8d8ff\' align="center">\n      <p>Fairview-Commonwealth</p>\n    </td>]]>"""

如何使用 lxml 在已解析的 kml 中正确替换它并且仍然是有效的 KML?使用 'valid',我属于可以在 Google 地球上加载的 kml。我曾尝试使用 BeautifulSoup 进行替换,但我的输出文件在 Google Earth 上加载时出现错误。它说,"Unexpected element "html""。所以我只想为此使用 lxml 。任何帮助将不胜感激。谢谢!

我有这个示例 kml,其中包含 5 个 LineString 地标。

trim.kml = https://sites.google.com/site/kmlhostingmwss/trim.kml

由于 KML 是一个有效的 XML 文件,请考虑 XSLT,专门用于修改 XML 文档和 Python 的 lxml 的转换语言可以 运行 XSLT 1.0 脚本。

具体来说,下面的动态 XSLT 从字符串中解析出来,运行首先使用 Identity Transform 复制文档,然后用 newhtml 变量替换每个出现的 <description>

import lxml.etree as ET

# READ IN KML FILE
dom = ET.parse('trim.kml')

newhtml = """<![CDATA[\n<!------------TITLE SUBCLASS---------------->\n  <tr>\n    <td colspan="2" align="center">\n      <b><font color=\'#090259\' size=\'6\' style = \'bold\'>LA MESA BALARA</font><b>\n    </td>/n  </tr>\n<!------------IMAGE---------------->\n  <tr>\n    <td colspan="2" align="center">\n      <img src= http://static.rappler.com/images/640-lamesadam-20120728.jpg, width=500, height = 223, alt="picture" />\n    </td>\n  </tr>\n<!------------PRIKEY---------------->\n  <tr>\n    <td bgcolor = \'#090259\', align="center" >\n      <p><font color = \'FFFFFF\', size =\'4\'>PRIKEY</p>\n    </td>\n \n    <td bgcolor = \'#d8d8ff\' align="center">\n      <p>0100026491</p>\n    </td>\n<!------------YEAR INSTALLED---------------->\n  <tr>\n    <td bgcolor = \'#090259\', align="center" >\n      <p><font color = \'FFFFFF\', size =\'4\'>Year Installed</p>\n    </td>\n \n    <td bgcolor = \'#d8d8ff\' align="center">\n      <p>2001</p>\n    </td>\n<!------------PIPE CLASS---------------->\n  <tr>\n    <td bgcolor = \'#090259\', align="center" >\n      <p><font color = \'FFFFFF\', size =\'4\'>Pipe Class</p>\n    </td>\n \n    <td bgcolor = \'#d8d8ff\' align="center">\n      <p>PRIMARY</p>\n    </td>\n<!------------DIAMETER---------------->\n  <tr>\n    <td bgcolor = \'#090259\', align="center" >\n      <p><font color = \'FFFFFF\', size =\'4\'>Diameter (mm)</p>\n    </td>\n \n    <td bgcolor = \'#d8d8ff\' align="center">\n      <p>1500.000000</p>\n    </td>\n<!------------MATERIAL---------------->\n  <tr>\n    <td bgcolor = \'#090259\', align="center" >\n      <p><font color = \'FFFFFF\', size =\'4\'>Material</p>\n    </td>\n \n    <td bgcolor = \'#d8d8ff\' align="center">\n      <p>SP</p>\n    </td>\n<!------------STATUS---------------->\n  <tr>\n    <td bgcolor = \'#090259\', align="center" >\n      <p><font color = \'FFFFFF\', size =\'4\'>Status</p>\n    </td>\n \n    <td bgcolor = \'#d8d8ff\' align="center">\n      <p>ACTIVE</p>\n    </td>\n<!------------BUSINESS ADDRESS---------------->\n  <tr>\n    <td bgcolor = \'#090259\', align="center" >\n      <p><font color = \'FFFFFF\', size =\'4\'>Business Address</p>\n    </td>\n \n    <td bgcolor = \'#d8d8ff\' align="center">\n      <p>Fairview-Commonwealth</p>\n    </td>]]>"""

# PARSE XSL FROM STRING
xslstr = '''<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:ogc="http://www.opengis.net/ogc" xmlns:wfs="http://www.opengis.net/wfs">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="description">
    <xsl:copy>
      <xsl:text disable-output-escaping="yes">&lt;![CDATA[</xsl:text>
      <xsl:text disable-output-escaping="yes">{}</xsl:text>
      <xsl:text disable-output-escaping="yes">]]&gt;</xsl:text>
    </xsl:copy>  
  </xsl:template>

</xsl:transform>'''.format(newhtml)

xslt = ET.fromstring(xslstr)

# TRANSFORM SOURCE TO NEW TREE
transform = ET.XSLT(xslt)
newdom = transform(dom)

# OUTPUT TO FILE
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)

xmlfile = open('newTrim.kml','wb')
xmlfile.write(tree_out)
xmlfile.close()