按距离减少 GPS 数据集
Reduce GPS data set by distance
我有一组 GPS 坐标,由 GPS 传感器和 Raspberry Pi 创建。我以 10hz 的频率对传感器进行极化,并将数据记录到 Pi 上的 SQL DB 中。该系统位于我的汽车顶部(也是建筑行业房屋扫描工具的一部分)。问题是我以不同的速度行驶。在某些情况下,我必须停下来让其他车辆通过,同时以 10hz 记录 GPS 位置。
记录数据后,我想 post 处理 GPS 数据并输出简化的坐标列表,以便我的位置相距大约 1 米。
我知道我也许可以使用 Pandas 来做这个,但不知道从哪里开始。
这是一个示例数据集:
51.80359349246259,-4.741180850463812
51.80361005410784,-4.740873766196046
51.80351890237921,-4.7415190658979895
51.803152371942325,-4.74057836870229
51.80352232936482,-4.740392650792621
51.80361261925252,-4.740896906964529
51.803487420307796,-4.7402764541541265
51.80353017387817,-4.74136689657748
51.80287372471039,-4.741218904144232
51.80326530703784,-4.740193742088211
非常感谢任何帮助。
library(data.table)
library(hutils)
setDT(gpsdata)
setDT(busdata.data)
gps_orig <- copy(gpsdata)
busdata.orig <- copy(busdata.data)
setkey(gpsdata, lat)
# Just to take note of the originals
gpsdata[, gps_lat := lat + 0]
gpsdata[, gps_lon := lon + 0]
busdata.data[, lat := latitude_bustops + 0]
busdata.data[, lon := longitude_bustops + 0]
setkey(busdata.data, lat)
gpsID_by_lat <-
gpsdata[, .(id), keyby = "lat"]
By_latitude <-
busdata.data[gpsdata,
on = "lat",
# within 0.5 degrees of latitude
roll = 0.5,
# +/-
rollends = c(TRUE, TRUE),
# and remove those beyond 0.5 degrees
nomatch=0L] %>%
.[, .(id_lat = id,
name_lat = name,
bus_lat = latitude_bustops,
bus_lon = longitude_bustops,
gps_lat,
gps_lon),
keyby = .(lon = gps_lon)]
setkey(busdata.data, lon)
By_latlon <-
busdata.data[By_latitude,
on = c("name==name_lat", "lon"),
# within 0.5 degrees of latitude
roll = 0.5,
# +/-
rollends = c(TRUE, TRUE),
# and remove those beyond 0.5 degrees
nomatch=0L]
By_latlon[, distance := haversine_distance(lat1 = gps_lat,
lon1 = gps_lon,
lat2 = bus_lat,
lon2 = bus_lon)]
By_latlon[distance < 0.2]
如何使用 geohash 来减少相同的位置。
http://en.wikipedia.org/wiki/Geohash
关于精度:
https://gis.stackexchange.com/questions/115280/what-is-the-precision-of-geohash
# (maximum X axis error, in km)
1 ± 2500
2 ± 630
3 ± 78
4 ± 20
5 ± 2.4
6 ± 0.61
7 ± 0.076
8 ± 0.019
9 ± 0.0024
10 ± 0.00060
11 ± 0.000074
# !pip install pygeodesy
from pygeodesy import geohash
def df_add_geohash(df, precision=7, col_lat='lat', col_lng='lon', geo_col='geo'):
df_to_convert = df.copy()
cond = df_to_convert[col_lat].notnull()
df_to_convert.loc[cond, geo_col] = (df_to_convert[cond].apply(lambda x: geohash.encode(
x[col_lat], x[col_lng], precision=precision)
,axis=1))
return df_to_convert
# apply the function
dfn = df_add_geohash(df, 7, 'lat', 'lon')
# filter the continuous same geo
cond = dfn['geo'] == dfn['geo'].shift(1)
print(dfn[~cond])
# lat lon geo
# 0 51.803593 -4.741181 gchwsne
# 3 51.803152 -4.740578 gchwsnk
# 4 51.803522 -4.740393 gchwsns
# 5 51.803613 -4.740897 gchwsne
# 6 51.803487 -4.740276 gchwsns
# 7 51.803530 -4.741367 gchwsne
# 8 51.802874 -4.741219 gchwsn7
# 9 51.803265 -4.740194 gchwsnk
如果想得到更精确的结果,可以计算附近记录点之间的距离,过滤小于1m的距离。
df = pd.DataFrame(
[{'lat': 51.803593492462596, 'lon': -4.741180850463811},
{'lat': 51.80361005410785, 'lon': -4.740873766196046},
{'lat': 51.80351890237921, 'lon': -4.7415190658979895},
{'lat': 51.80315237194233, 'lon': -4.74057836870229},
{'lat': 51.803522329364824, 'lon': -4.7403926507926215},
{'lat': 51.80361261925252, 'lon': -4.740896906964529},
{'lat': 51.803487420307796, 'lon': -4.740276454154127},
{'lat': 51.80353017387817, 'lon': -4.74136689657748},
{'lat': 51.80287372471039, 'lon': -4.741218904144231},
{'lat': 51.80326530703784, 'lon': -4.740193742088211}]
)
df['lat_pre'] = df['lat'].shift(1)
df['lon_pre'] = df['lon'].shift(1)
# !pip install geopy
# https://geopy.readthedocs.io/en/stable/#installation
from geopy.distance import geodesic
cond = df['lat_pre'].notnull()
df.loc[cond, 'distance'] = df[cond].apply(lambda row: geodesic((row.lat, row.lon),
(row.lat_pre, row.lon_pre)).m
, axis=1)
cond = df['distance'] < 1
print(df[~cond])
# lat lon lat_pre lon_pre distance
# 0 51.803593 -4.741181 NaN NaN NaN
# 1 51.803610 -4.740874 51.803593 -4.741181 21.262108
# 2 51.803519 -4.741519 51.803610 -4.740874 45.652403
# 3 51.803152 -4.740578 51.803519 -4.741519 76.639257
# 4 51.803522 -4.740393 51.803152 -4.740578 43.110166
# 5 51.803613 -4.740897 51.803522 -4.740393 36.204379
# 6 51.803487 -4.740276 51.803613 -4.740897 45.007709
# 7 51.803530 -4.741367 51.803487 -4.740276 75.367133
# 8 51.802874 -4.741219 51.803530 -4.741367 73.748842
# 9 51.803265 -4.740194 51.802874 -4.741219 83.059036
我根据找到@Ferris 建议的距离制定了一个解决方案。 'mpu.haversine_distance' 函数 returns 两个 lat/lng 对之间的距离,以公里为单位。我乘以 1000 以显示为米。然后我将这些距离相加,如果它超过 1 米,我会报告 lat/lng。这个可以调整到3米等等
import mpu
def processTheSet(batch):
mycursorll = mydb.cursor()
sqlll = "SELECT latt, longg FROM interPol WHERE batchID = %s ORDER BY `fileTime`"
batchI = (batch,)
mycursorll.execute(sqlll, batchI)
firstResult = mycursorll.fetchone()
firstLat = float(firstResult[0])
firstLng = float(firstResult[1])
myresultll = mycursorll.fetchall()
i = 0
count = 0
counter = 0
dist = 0
for x in myresultll:
i = i + 1
thisLat = float(x[0])
thisLong = float(x[1])
dist = mpu.haversine_distance((firstLat, firstLng), (thisLat, thisLong)) * 1000
firstLat = thisLat
firstLng = thisLong
counter = counter + dist
if counter > 1:
count = count + 1
counter = 0
print(thisLong, ",", thisLat)
我有一组 GPS 坐标,由 GPS 传感器和 Raspberry Pi 创建。我以 10hz 的频率对传感器进行极化,并将数据记录到 Pi 上的 SQL DB 中。该系统位于我的汽车顶部(也是建筑行业房屋扫描工具的一部分)。问题是我以不同的速度行驶。在某些情况下,我必须停下来让其他车辆通过,同时以 10hz 记录 GPS 位置。
记录数据后,我想 post 处理 GPS 数据并输出简化的坐标列表,以便我的位置相距大约 1 米。
我知道我也许可以使用 Pandas 来做这个,但不知道从哪里开始。
这是一个示例数据集:
51.80359349246259,-4.741180850463812
51.80361005410784,-4.740873766196046
51.80351890237921,-4.7415190658979895
51.803152371942325,-4.74057836870229
51.80352232936482,-4.740392650792621
51.80361261925252,-4.740896906964529
51.803487420307796,-4.7402764541541265
51.80353017387817,-4.74136689657748
51.80287372471039,-4.741218904144232
51.80326530703784,-4.740193742088211
非常感谢任何帮助。
library(data.table)
library(hutils)
setDT(gpsdata)
setDT(busdata.data)
gps_orig <- copy(gpsdata)
busdata.orig <- copy(busdata.data)
setkey(gpsdata, lat)
# Just to take note of the originals
gpsdata[, gps_lat := lat + 0]
gpsdata[, gps_lon := lon + 0]
busdata.data[, lat := latitude_bustops + 0]
busdata.data[, lon := longitude_bustops + 0]
setkey(busdata.data, lat)
gpsID_by_lat <-
gpsdata[, .(id), keyby = "lat"]
By_latitude <-
busdata.data[gpsdata,
on = "lat",
# within 0.5 degrees of latitude
roll = 0.5,
# +/-
rollends = c(TRUE, TRUE),
# and remove those beyond 0.5 degrees
nomatch=0L] %>%
.[, .(id_lat = id,
name_lat = name,
bus_lat = latitude_bustops,
bus_lon = longitude_bustops,
gps_lat,
gps_lon),
keyby = .(lon = gps_lon)]
setkey(busdata.data, lon)
By_latlon <-
busdata.data[By_latitude,
on = c("name==name_lat", "lon"),
# within 0.5 degrees of latitude
roll = 0.5,
# +/-
rollends = c(TRUE, TRUE),
# and remove those beyond 0.5 degrees
nomatch=0L]
By_latlon[, distance := haversine_distance(lat1 = gps_lat,
lon1 = gps_lon,
lat2 = bus_lat,
lon2 = bus_lon)]
By_latlon[distance < 0.2]
如何使用 geohash 来减少相同的位置。
http://en.wikipedia.org/wiki/Geohash
关于精度: https://gis.stackexchange.com/questions/115280/what-is-the-precision-of-geohash
# (maximum X axis error, in km)
1 ± 2500
2 ± 630
3 ± 78
4 ± 20
5 ± 2.4
6 ± 0.61
7 ± 0.076
8 ± 0.019
9 ± 0.0024
10 ± 0.00060
11 ± 0.000074
# !pip install pygeodesy
from pygeodesy import geohash
def df_add_geohash(df, precision=7, col_lat='lat', col_lng='lon', geo_col='geo'):
df_to_convert = df.copy()
cond = df_to_convert[col_lat].notnull()
df_to_convert.loc[cond, geo_col] = (df_to_convert[cond].apply(lambda x: geohash.encode(
x[col_lat], x[col_lng], precision=precision)
,axis=1))
return df_to_convert
# apply the function
dfn = df_add_geohash(df, 7, 'lat', 'lon')
# filter the continuous same geo
cond = dfn['geo'] == dfn['geo'].shift(1)
print(dfn[~cond])
# lat lon geo
# 0 51.803593 -4.741181 gchwsne
# 3 51.803152 -4.740578 gchwsnk
# 4 51.803522 -4.740393 gchwsns
# 5 51.803613 -4.740897 gchwsne
# 6 51.803487 -4.740276 gchwsns
# 7 51.803530 -4.741367 gchwsne
# 8 51.802874 -4.741219 gchwsn7
# 9 51.803265 -4.740194 gchwsnk
如果想得到更精确的结果,可以计算附近记录点之间的距离,过滤小于1m的距离。
df = pd.DataFrame(
[{'lat': 51.803593492462596, 'lon': -4.741180850463811},
{'lat': 51.80361005410785, 'lon': -4.740873766196046},
{'lat': 51.80351890237921, 'lon': -4.7415190658979895},
{'lat': 51.80315237194233, 'lon': -4.74057836870229},
{'lat': 51.803522329364824, 'lon': -4.7403926507926215},
{'lat': 51.80361261925252, 'lon': -4.740896906964529},
{'lat': 51.803487420307796, 'lon': -4.740276454154127},
{'lat': 51.80353017387817, 'lon': -4.74136689657748},
{'lat': 51.80287372471039, 'lon': -4.741218904144231},
{'lat': 51.80326530703784, 'lon': -4.740193742088211}]
)
df['lat_pre'] = df['lat'].shift(1)
df['lon_pre'] = df['lon'].shift(1)
# !pip install geopy
# https://geopy.readthedocs.io/en/stable/#installation
from geopy.distance import geodesic
cond = df['lat_pre'].notnull()
df.loc[cond, 'distance'] = df[cond].apply(lambda row: geodesic((row.lat, row.lon),
(row.lat_pre, row.lon_pre)).m
, axis=1)
cond = df['distance'] < 1
print(df[~cond])
# lat lon lat_pre lon_pre distance
# 0 51.803593 -4.741181 NaN NaN NaN
# 1 51.803610 -4.740874 51.803593 -4.741181 21.262108
# 2 51.803519 -4.741519 51.803610 -4.740874 45.652403
# 3 51.803152 -4.740578 51.803519 -4.741519 76.639257
# 4 51.803522 -4.740393 51.803152 -4.740578 43.110166
# 5 51.803613 -4.740897 51.803522 -4.740393 36.204379
# 6 51.803487 -4.740276 51.803613 -4.740897 45.007709
# 7 51.803530 -4.741367 51.803487 -4.740276 75.367133
# 8 51.802874 -4.741219 51.803530 -4.741367 73.748842
# 9 51.803265 -4.740194 51.802874 -4.741219 83.059036
我根据找到@Ferris 建议的距离制定了一个解决方案。 'mpu.haversine_distance' 函数 returns 两个 lat/lng 对之间的距离,以公里为单位。我乘以 1000 以显示为米。然后我将这些距离相加,如果它超过 1 米,我会报告 lat/lng。这个可以调整到3米等等
import mpu
def processTheSet(batch):
mycursorll = mydb.cursor()
sqlll = "SELECT latt, longg FROM interPol WHERE batchID = %s ORDER BY `fileTime`"
batchI = (batch,)
mycursorll.execute(sqlll, batchI)
firstResult = mycursorll.fetchone()
firstLat = float(firstResult[0])
firstLng = float(firstResult[1])
myresultll = mycursorll.fetchall()
i = 0
count = 0
counter = 0
dist = 0
for x in myresultll:
i = i + 1
thisLat = float(x[0])
thisLong = float(x[1])
dist = mpu.haversine_distance((firstLat, firstLng), (thisLat, thisLong)) * 1000
firstLat = thisLat
firstLng = thisLong
counter = counter + dist
if counter > 1:
count = count + 1
counter = 0
print(thisLong, ",", thisLat)