正在使用 BeautifulSoup 解析 KML/HTML 文件
Parsing KML/HTML file using BeautifulSoup
我正在查看一个 kml
文件,我正尝试使用 BeautifulSoup
对其进行解析
我正在尝试以下代码,但未能获得所需的输出:
from bs4 import BeautifulSoup
fn = r'sampleKMLFile.kml'
f = open(fn, 'r')
s = BeautifulSoup(f, 'xml')
pnodes = s.find_all('name')
z2 = s.find_all("Folder", {"name": 'MyNodes'})
基本上我想在 pandas 数据帧中实现以下目标:
MyNodes longitude latitude
Houston -95 33
Austin -97 33
KML 文件较大,包含以下我感兴趣的内容。
<Folder>
<name>MyNodes</name>
<Placemark>
<name>Houston</name>
<Camera>
<longitude>-95</longitude>
<latitude>33</latitude>
<roll>-1.6</roll>
<gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
</Camera>
<styleUrl>#msn_placemark_circle</styleUrl>
<Point>
<coordinates>-95,33,0</coordinates>
<gx:drawOrder>1</gx:drawOrder>
</Point>
</Placemark>
<Placemark>
<name>Austin</name>
<Camera>
<longitude>-97</longitude>
<latitude>33</latitude>
<roll>-1.6</roll>
<gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
</Camera>
<styleUrl>#msn_placemark_circle</styleUrl>
<Point>
<coordinates>-97,33,0</coordinates>
<gx:drawOrder>1</gx:drawOrder>
</Point>
</Placemark>
</Folder>
编辑:由于上面的 html 格式字符串在文件中,我如何读取文件以从下面的两个答案中生成 sample
或 cnt
# cnt = """KML file content str"""
soup = BeautifulSoup(cnt, "lxml")
placemark = soup.find_all("placemark")
print("MyNodes longitude latitude")
for obj in placemark:
# here you can save the value inside i.e. a dictionary
print(obj.find("name").text, end=" ")
print(obj.find("longitude").text, end=" ")
print(obj.find("latitude").text)
# MyNodes longitude latitude
# Houston -95 33
# Austin -97 33
您可以尝试这样的操作:
from bs4 import BeautifulSoup
import pandas as pd
sample = """<Folder>
<name>MyNodes</name>
<Placemark>
<name>Houston</name>
<Camera>
<longitude>-95</longitude>
<latitude>33</latitude>
<roll>-1.6</roll>
<gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
</Camera>
<styleUrl>#msn_placemark_circle</styleUrl>
<Point>
<coordinates>-95,33,0</coordinates>
<gx:drawOrder>1</gx:drawOrder>
</Point>
</Placemark>
<Placemark>
<name>Austin</name>
<Camera>
<longitude>-97</longitude>
<latitude>33</latitude>
<roll>-1.6</roll>
<gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
</Camera>
<styleUrl>#msn_placemark_circle</styleUrl>
<Point>
<coordinates>-97,33,0</coordinates>
<gx:drawOrder>1</gx:drawOrder>
</Point>
</Placemark>
</Folder>
"""
def finder(tag: str) -> list:
return [i.getText() for i in soup.find_all(tag) if i.getText() != "MyNodes"]
soup = BeautifulSoup(sample, features="xml")
df = pd.DataFrame(
zip(finder("name"), finder("longitude"), finder("latitude")),
columns=["Nodes", "Longitude", "Latitude"],
)
print(df)
输出:
Nodes Longitude Latitude
0 Houston -95 33
1 Austin -97 33
我正在查看一个 kml
文件,我正尝试使用 BeautifulSoup
我正在尝试以下代码,但未能获得所需的输出:
from bs4 import BeautifulSoup
fn = r'sampleKMLFile.kml'
f = open(fn, 'r')
s = BeautifulSoup(f, 'xml')
pnodes = s.find_all('name')
z2 = s.find_all("Folder", {"name": 'MyNodes'})
基本上我想在 pandas 数据帧中实现以下目标:
MyNodes longitude latitude
Houston -95 33
Austin -97 33
KML 文件较大,包含以下我感兴趣的内容。
<Folder>
<name>MyNodes</name>
<Placemark>
<name>Houston</name>
<Camera>
<longitude>-95</longitude>
<latitude>33</latitude>
<roll>-1.6</roll>
<gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
</Camera>
<styleUrl>#msn_placemark_circle</styleUrl>
<Point>
<coordinates>-95,33,0</coordinates>
<gx:drawOrder>1</gx:drawOrder>
</Point>
</Placemark>
<Placemark>
<name>Austin</name>
<Camera>
<longitude>-97</longitude>
<latitude>33</latitude>
<roll>-1.6</roll>
<gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
</Camera>
<styleUrl>#msn_placemark_circle</styleUrl>
<Point>
<coordinates>-97,33,0</coordinates>
<gx:drawOrder>1</gx:drawOrder>
</Point>
</Placemark>
</Folder>
编辑:由于上面的 html 格式字符串在文件中,我如何读取文件以从下面的两个答案中生成 sample
或 cnt
# cnt = """KML file content str"""
soup = BeautifulSoup(cnt, "lxml")
placemark = soup.find_all("placemark")
print("MyNodes longitude latitude")
for obj in placemark:
# here you can save the value inside i.e. a dictionary
print(obj.find("name").text, end=" ")
print(obj.find("longitude").text, end=" ")
print(obj.find("latitude").text)
# MyNodes longitude latitude
# Houston -95 33
# Austin -97 33
您可以尝试这样的操作:
from bs4 import BeautifulSoup
import pandas as pd
sample = """<Folder>
<name>MyNodes</name>
<Placemark>
<name>Houston</name>
<Camera>
<longitude>-95</longitude>
<latitude>33</latitude>
<roll>-1.6</roll>
<gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
</Camera>
<styleUrl>#msn_placemark_circle</styleUrl>
<Point>
<coordinates>-95,33,0</coordinates>
<gx:drawOrder>1</gx:drawOrder>
</Point>
</Placemark>
<Placemark>
<name>Austin</name>
<Camera>
<longitude>-97</longitude>
<latitude>33</latitude>
<roll>-1.6</roll>
<gx:altitudeMode>relativeToSeaFloor</gx:altitudeMode>
</Camera>
<styleUrl>#msn_placemark_circle</styleUrl>
<Point>
<coordinates>-97,33,0</coordinates>
<gx:drawOrder>1</gx:drawOrder>
</Point>
</Placemark>
</Folder>
"""
def finder(tag: str) -> list:
return [i.getText() for i in soup.find_all(tag) if i.getText() != "MyNodes"]
soup = BeautifulSoup(sample, features="xml")
df = pd.DataFrame(
zip(finder("name"), finder("longitude"), finder("latitude")),
columns=["Nodes", "Longitude", "Latitude"],
)
print(df)
输出:
Nodes Longitude Latitude
0 Houston -95 33
1 Austin -97 33