正在使用 BeautifulSoup 解析 KML/HTML 文件

Parsing KML/HTML file using BeautifulSoup

我正在查看一个 kml 文件,我正尝试使用 BeautifulSoup

对其进行解析

我正在尝试以下代码,但未能获得所需的输出:

from bs4 import BeautifulSoup
fn = r'sampleKMLFile.kml'
f = open(fn, 'r')
s = BeautifulSoup(f, 'xml')
pnodes = s.find_all('name')
z2 = s.find_all("Folder", {"name": 'MyNodes'})

基本上我想在 pandas 数据帧中实现以下目标:

MyNodes longitude latitude
Houston -95       33 
Austin  -97       33

KML 文件较大,包含以下我感兴趣的内容。

<Folder>
        <name>MyNodes</name>
    <Placemark>
      <name>Houston</name>
      <Camera>
        <longitude>-95</longitude>
        <latitude>33</latitude>
        <roll>-1.6</roll>
        <gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
      </Camera>
      <styleUrl>#msn_placemark_circle</styleUrl>
      <Point>
        <coordinates>-95,33,0</coordinates>
        <gx:drawOrder>1</gx:drawOrder>
      </Point>
    </Placemark>
    <Placemark>
      <name>Austin</name>
      <Camera>
        <longitude>-97</longitude>
        <latitude>33</latitude>
        <roll>-1.6</roll>
        <gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
      </Camera>
      <styleUrl>#msn_placemark_circle</styleUrl>
      <Point>
        <coordinates>-97,33,0</coordinates>
        <gx:drawOrder>1</gx:drawOrder>
      </Point>
    </Placemark>
      </Folder>

编辑:由于上面的 html 格式字符串在文件中,我如何读取文件以从下面的两个答案中生成 samplecnt

# cnt = """KML file content str"""
soup = BeautifulSoup(cnt, "lxml")
placemark = soup.find_all("placemark")
print("MyNodes longitude latitude")
for obj in placemark:
    # here you can save the value inside i.e. a dictionary
    print(obj.find("name").text, end=" ")
    print(obj.find("longitude").text, end=" ")
    print(obj.find("latitude").text)

# MyNodes longitude latitude
# Houston -95 33
# Austin -97 33

您可以尝试这样的操作:

from bs4 import BeautifulSoup
import pandas as pd

sample = """<Folder>
        <name>MyNodes</name>
    <Placemark>
      <name>Houston</name>
      <Camera>
        <longitude>-95</longitude>
        <latitude>33</latitude>
        <roll>-1.6</roll>
        <gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
      </Camera>
      <styleUrl>#msn_placemark_circle</styleUrl>
      <Point>
        <coordinates>-95,33,0</coordinates>
        <gx:drawOrder>1</gx:drawOrder>
      </Point>
    </Placemark>
    <Placemark>
      <name>Austin</name>
      <Camera>
        <longitude>-97</longitude>
        <latitude>33</latitude>
        <roll>-1.6</roll>
        <gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
      </Camera>
      <styleUrl>#msn_placemark_circle</styleUrl>
      <Point>
        <coordinates>-97,33,0</coordinates>
        <gx:drawOrder>1</gx:drawOrder>
      </Point>
    </Placemark>
      </Folder>
"""


def finder(tag: str) -> list:
    return [i.getText() for i in soup.find_all(tag) if i.getText() != "MyNodes"]


soup = BeautifulSoup(sample, features="xml")
df = pd.DataFrame(
    zip(finder("name"), finder("longitude"), finder("latitude")),
    columns=["Nodes", "Longitude", "Latitude"],
)
print(df)

输出:

     Nodes Longitude Latitude
0  Houston       -95       33
1   Austin       -97       33