如何将 python 中的纬度和经度数据聚类(或删除不需要的数据)?
How to cluster Latitude and longitude data in python (or remove unwanted data)?
我在 pandas df 中有一个大小为 (34000 * 2) 的纬度和经度数据
df =
Index Latitude Longitude
0 66.36031097267725 23.714807357485936
1 66.36030099322495 23.71479548193769
2
.
.
.
.
34000 66.27918383581169 23.568631229948359
Important Note : The above Lat & Long route has been covered twice which means if I cover the route only once, then my Latitude and Longitude data will be of size (34000/2, 2) for example.
问题
我只需要特定选定区域的纬度和经度数据。所以我在我的 df 中使用开始和结束的 Lat 和 Long 点进行过滤。这样做时,该区域的另一部分也被选中。 (See picture below after filtering)
要求
如何删除附加区域?我相信这个问题会有一些简单的方法。
注意 : 过滤后的纬度和经度数据也覆盖了两次。
过滤
def apply_geofence_on_data(interpolated_data, min_latitude=66.27832887852133, max_latitude=66.37098470528755, min_longitude=23.568626549485927,
max_longitude=23.71481685393929):
interpolated_data = interpolated_data[interpolated_data['Latitude'] > min_latitude]
interpolated_data = interpolated_data[interpolated_data['Latitude'] < max_latitude]
interpolated_data = interpolated_data[interpolated_data['Longitude'] < max_longitude]
interpolated_data = interpolated_data[interpolated_data['Longitude'] > min_longitude]
return interpolated_data
这里是测试的解决方案:想法是将所有点都放在线上方。你选择P的值到select右边的行。
from random import uniform
import matplotlib.pyplot as plt
def newpoint(lon_min = -180.0, lon_max = 180.0, lat_min = -90.0, lat_max = 90.0 ):#long,lat
return uniform(lon_min, lon_max), uniform(lat_min, lat_max)
lon_min = 23.568626549485927; lon_max = 23.71481685393929
lat_min = 66.27832887852133; lat_max = 66.37098470528755
p = 0.25 # i have taken this value for sample, for your case i think a value nearer from 0.75
# i generate 10 points for sample
n=10
points = (newpoint(lon_min, lon_max, lat_min, lat_max) for x in range(n))
points = [x for x in points]
Lon = [x for x,y in points]
Lat = [x for y,x in points]
df = pd.DataFrame({'Lat': Lat, 'Lon': Lon})
print(df)
#equation of the line using points A and B -> y=m*x + z
m = (lat_max - lat_min)/(lon_max - lon_min)
z = lat_min - m * (lon_min + p * (lon_max - lon_min))
xa = lon_min + p * (lon_max - lon_min)
xb = lon_max
#you could uncomment to display result
#df['calcul'] = df['Lon'] * m + z
#select only points above the line
df = df[df['Lon'] * m + z < df['Lat']]
print(df)
#plot to show result
plt.plot([xa, xb] , [m * xa + z, m * xb + z])
plt.plot(df.Lon, df.Lat, 'ro')
plt.show()
初始输出:
Lat Lon
0 66.343486 23.674008
1 66.281614 23.678554
2 66.359215 23.637975
3 66.303976 23.659128
4 66.302640 23.589577
5 66.313877 23.634785
6 66.309733 23.683281
7 66.365582 23.667262
8 66.344611 23.688108
9 66.352028 23.673376
最终结果:点索引 1、3 和 6 已被推迟(它们在线下方)
Lat Lon
0 66.343486 23.674008
2 66.359215 23.637975
4 66.302640 23.589577
5 66.313877 23.634785
7 66.365582 23.667262
8 66.344611 23.688108
9 66.352028 23.673376
我在 pandas df 中有一个大小为 (34000 * 2) 的纬度和经度数据
df =
Index Latitude Longitude
0 66.36031097267725 23.714807357485936
1 66.36030099322495 23.71479548193769
2
.
.
.
.
34000 66.27918383581169 23.568631229948359
Important Note : The above Lat & Long route has been covered twice which means if I cover the route only once, then my Latitude and Longitude data will be of size (34000/2, 2) for example.
问题
我只需要特定选定区域的纬度和经度数据。所以我在我的 df 中使用开始和结束的 Lat 和 Long 点进行过滤。这样做时,该区域的另一部分也被选中。 (See picture below after filtering)
要求
如何删除附加区域?我相信这个问题会有一些简单的方法。 注意 : 过滤后的纬度和经度数据也覆盖了两次。
过滤
def apply_geofence_on_data(interpolated_data, min_latitude=66.27832887852133, max_latitude=66.37098470528755, min_longitude=23.568626549485927,
max_longitude=23.71481685393929):
interpolated_data = interpolated_data[interpolated_data['Latitude'] > min_latitude]
interpolated_data = interpolated_data[interpolated_data['Latitude'] < max_latitude]
interpolated_data = interpolated_data[interpolated_data['Longitude'] < max_longitude]
interpolated_data = interpolated_data[interpolated_data['Longitude'] > min_longitude]
return interpolated_data
这里是测试的解决方案:想法是将所有点都放在线上方。你选择P的值到select右边的行。
from random import uniform
import matplotlib.pyplot as plt
def newpoint(lon_min = -180.0, lon_max = 180.0, lat_min = -90.0, lat_max = 90.0 ):#long,lat
return uniform(lon_min, lon_max), uniform(lat_min, lat_max)
lon_min = 23.568626549485927; lon_max = 23.71481685393929
lat_min = 66.27832887852133; lat_max = 66.37098470528755
p = 0.25 # i have taken this value for sample, for your case i think a value nearer from 0.75
# i generate 10 points for sample
n=10
points = (newpoint(lon_min, lon_max, lat_min, lat_max) for x in range(n))
points = [x for x in points]
Lon = [x for x,y in points]
Lat = [x for y,x in points]
df = pd.DataFrame({'Lat': Lat, 'Lon': Lon})
print(df)
#equation of the line using points A and B -> y=m*x + z
m = (lat_max - lat_min)/(lon_max - lon_min)
z = lat_min - m * (lon_min + p * (lon_max - lon_min))
xa = lon_min + p * (lon_max - lon_min)
xb = lon_max
#you could uncomment to display result
#df['calcul'] = df['Lon'] * m + z
#select only points above the line
df = df[df['Lon'] * m + z < df['Lat']]
print(df)
#plot to show result
plt.plot([xa, xb] , [m * xa + z, m * xb + z])
plt.plot(df.Lon, df.Lat, 'ro')
plt.show()
初始输出:
Lat Lon
0 66.343486 23.674008
1 66.281614 23.678554
2 66.359215 23.637975
3 66.303976 23.659128
4 66.302640 23.589577
5 66.313877 23.634785
6 66.309733 23.683281
7 66.365582 23.667262
8 66.344611 23.688108
9 66.352028 23.673376
最终结果:点索引 1、3 和 6 已被推迟(它们在线下方)
Lat Lon
0 66.343486 23.674008
2 66.359215 23.637975
4 66.302640 23.589577
5 66.313877 23.634785
7 66.365582 23.667262
8 66.344611 23.688108
9 66.352028 23.673376