Pandas dataframe：将函数应用于行值和上一行的值

Question

我正在尝试将以下函数应用于 Pandas 数据框：

def eukarney(lat1, lon1, alt1, lat2, lon2, alt2):
    p1 = (lat1, lon1)
    p2 = (lat2, lon2)
    karney = distance.distance(p1, p2).m
    return np.sqrt(karney**2 + (alt2 - alt1)**2)

如果我使用离散值，例如：

distance = eukarney(49.907611, 5.890404, 339.15734, 49.907683, 5.890373, 339.18224)

但是，如果我尝试将该函数应用于 Pandas 数据框：

df['distances'] = eukarney(df['latitude'], df['longitude'], df['altitude'], df['latitude'].shift(), df['longitude'].shift(), df['altitude'].shift())

这意味着从一行和前一行中获取值。

我收到以下错误消息：

Traceback (most recent call last): File "/home/mirix/Desktop/plage/GPX_invert_sense_change_starting_point_va.py", line 78, in df['distances'] = eukarney(df.loc[:,'latitude':], df.loc[:,'longitude':], df.loc[:,'altitude':], df.loc[:,'latitude':].shift(), df.loc[:,'longitude':].shift(), df.loc[:,'altitude':].shift()) File "/home/mirix/Desktop/plage/GPX_invert_sense_change_starting_point_va.py", line 75, in eukarney karney = distance.distance(p1, p2).m File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py", line 522, in init super().init(*args, **kwargs) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py", line 276, in init kilometers += self.measure(a, b) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py", line 538, in measure a, b = Point(a), Point(b) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 175, in new return cls.from_sequence(seq) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 472, in from_sequence return cls(*args) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 188, in new _normalize_coordinates(latitude, longitude, altitude) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 57, in _normalize_coordinates latitude = float(latitude or 0.0) File "/home/mirix/.local/lib/python3.9/site-packages/pandas/core/generic.py", line 1534, in nonzero raise ValueError( ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

有趣的是，相同的语法适用于不使用 geopy 库的其他函数。

有什么想法吗？

解决方案

GeoPy 的距离函数似乎有一个内在的限制，它似乎只接受标量。

以下解决方法基于@SeaBen 的回答：

df['lat_shift'] = df['latitude'].shift().fillna(df['latitude'])
df['lon_shift'] = df['longitude'].shift().fillna(df['longitude'])
df['alt_shift'] = df['altitude'].shift().fillna(df['altitude'])

df['distances'] = df.apply(lambda x: eukarney(x['latitude'], x['longitude'], x['altitude'], x['lat_shift'], x['lon_shift'], x['alt_shift']), axis=1).fillna(0)

Answer 1

你可以在每一行使用.apply()，如下：

此处，.apply() 帮助您将标量值逐行传递给自定义函数。因此，使您能够重用设计用于处理标量值的自定义函数。否则，您可能需要修改自定义函数以支持 Pandas.

的向量化数组值

为了迎合 .shift() 条目，一种解决方法是先为它们定义新列，以便我们可以将它们传递给 .apply() 函数。

# Take previous entry by shift and `fillna` with original value for first row entry 
# (for in case the custom function cannot handle `NaN` entry on first row after shift)
df['lat_shift'] = df['latitude'].shift().fillna(df['latitude'])
df['lon_shift'] = df['longitude'].shift().fillna(df['longitude'])
df['alt_shift'] = df['altitude'].shift().fillna(df['altitude'])

df['distances'] = df.apply(lambda x: eukarney(x['latitude'], x['longitude'], x['altitude'], x['lat_shift'], x['lon_shift'], x['alt_shift']), axis=1).fillna(0)

Pandas dataframe：将函数应用于行值和上一行的值

Pandas dataframe : Applying function to row value and value from the previous row

python

geopy

pandas