每个客户的连续行之间的 Haversine 距离
Haversine Distance between consecutive rows for each Customer
我的问题有点基于这个 Fast Haversine Approximation (Python/Pandas)
基本上,这个问题问的是如何计算 Haversine 距离。我的是如何计算每个客户的连续行之间的 Haversine 距离。
我的数据集看起来像这个虚拟数据集(假设它们是真实坐标):
Customer Lat Lon
A 1 2
A 1 2
B 3 2
B 4 2
所以在这里,我在第一行什么也得不到,第二行是 0,第三行什么也得不到,因为新客户开始了,无论以公里为单位的距离在 (3,2) 和 (4, 2) 在第四个.
这在不受客户约束的情况下有效:
def haversine(lat1, lon1, lat2, lon2, to_radians=True):
if to_radians:
lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
a = np.sin((lat2-lat1)/2.0)**2 + \
np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
return 6367 * 2 * np.arcsin(np.sqrt(a))
df=data_full
df['dist'] = \
haversine(df.Lon.shift(), df.Lat.shift(),
df.loc[1:, 'Lon'], df.loc[1:, 'Lat'])
但我无法将其调整为针对每个新客户重新启动。我试过这个:
def haversine(lat1, lon1, lat2, lon2, to_radians=True):
if to_radians:
lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
a = np.sin((lat2-lat1)/2.0)**2 + \
np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
return 6367 * 2 * np.arcsin(np.sqrt(a))
df=data_full
df['dist'] = \
df.groupby('Customer_id')['Lat','Lon'].apply(lambda df: haversine(df.Lon.shift(), df.Lat.shift(),
df.loc[1:, 'Lon'], df.loc[1:, 'Lat']))
我将重用 derricw's answer 中的向量化 haversine_np
函数:
def haversine_np(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
All args must be of equal length.
"""
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
def distance(x):
y = x.shift()
return haversine_np(x['Lat'], x['Lon'], y['Lat'], y['Lon']).fillna(0)
df['Distance'] = df.groupby('Customer').apply(distance).reset_index(level=0, drop=True)
结果:
Customer Lat Lon Distance
0 A 1 2 0.000000
1 A 1 2 0.000000
2 B 3 2 0.000000
3 B 4 2 111.057417
我的问题有点基于这个 Fast Haversine Approximation (Python/Pandas)
基本上,这个问题问的是如何计算 Haversine 距离。我的是如何计算每个客户的连续行之间的 Haversine 距离。
我的数据集看起来像这个虚拟数据集(假设它们是真实坐标):
Customer Lat Lon
A 1 2
A 1 2
B 3 2
B 4 2
所以在这里,我在第一行什么也得不到,第二行是 0,第三行什么也得不到,因为新客户开始了,无论以公里为单位的距离在 (3,2) 和 (4, 2) 在第四个.
这在不受客户约束的情况下有效:
def haversine(lat1, lon1, lat2, lon2, to_radians=True):
if to_radians:
lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
a = np.sin((lat2-lat1)/2.0)**2 + \
np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
return 6367 * 2 * np.arcsin(np.sqrt(a))
df=data_full
df['dist'] = \
haversine(df.Lon.shift(), df.Lat.shift(),
df.loc[1:, 'Lon'], df.loc[1:, 'Lat'])
但我无法将其调整为针对每个新客户重新启动。我试过这个:
def haversine(lat1, lon1, lat2, lon2, to_radians=True):
if to_radians:
lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
a = np.sin((lat2-lat1)/2.0)**2 + \
np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
return 6367 * 2 * np.arcsin(np.sqrt(a))
df=data_full
df['dist'] = \
df.groupby('Customer_id')['Lat','Lon'].apply(lambda df: haversine(df.Lon.shift(), df.Lat.shift(),
df.loc[1:, 'Lon'], df.loc[1:, 'Lat']))
我将重用 derricw's answer 中的向量化 haversine_np
函数:
def haversine_np(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
All args must be of equal length.
"""
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
def distance(x):
y = x.shift()
return haversine_np(x['Lat'], x['Lon'], y['Lat'], y['Lon']).fillna(0)
df['Distance'] = df.groupby('Customer').apply(distance).reset_index(level=0, drop=True)
结果:
Customer Lat Lon Distance
0 A 1 2 0.000000
1 A 1 2 0.000000
2 B 3 2 0.000000
3 B 4 2 111.057417