Pandas dataframe:将函数应用于行值和上一行的值
Pandas dataframe : Applying function to row value and value from the previous row
我正在尝试将以下函数应用于 Pandas 数据框:
def eukarney(lat1, lon1, alt1, lat2, lon2, alt2):
p1 = (lat1, lon1)
p2 = (lat2, lon2)
karney = distance.distance(p1, p2).m
return np.sqrt(karney**2 + (alt2 - alt1)**2)
如果我使用离散值,例如:
distance = eukarney(49.907611, 5.890404, 339.15734, 49.907683, 5.890373, 339.18224)
但是,如果我尝试将该函数应用于 Pandas 数据框:
df['distances'] = eukarney(df['latitude'], df['longitude'], df['altitude'], df['latitude'].shift(), df['longitude'].shift(), df['altitude'].shift())
这意味着从一行和前一行中获取值。
我收到以下错误消息:
Traceback (most recent call last): File
"/home/mirix/Desktop/plage/GPX_invert_sense_change_starting_point_va.py",
line 78, in
df['distances'] = eukarney(df.loc[:,'latitude':], df.loc[:,'longitude':], df.loc[:,'altitude':],
df.loc[:,'latitude':].shift(), df.loc[:,'longitude':].shift(),
df.loc[:,'altitude':].shift()) File
"/home/mirix/Desktop/plage/GPX_invert_sense_change_starting_point_va.py",
line 75, in eukarney
karney = distance.distance(p1, p2).m File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py",
line 522, in init
super().init(*args, **kwargs) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py",
line 276, in init
kilometers += self.measure(a, b) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py",
line 538, in measure
a, b = Point(a), Point(b) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line
175, in new
return cls.from_sequence(seq) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line
472, in from_sequence
return cls(*args) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line
188, in new
_normalize_coordinates(latitude, longitude, altitude) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line
57, in _normalize_coordinates
latitude = float(latitude or 0.0) File "/home/mirix/.local/lib/python3.9/site-packages/pandas/core/generic.py",
line 1534, in nonzero
raise ValueError( ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
有趣的是,相同的语法适用于不使用 geopy 库的其他函数。
有什么想法吗?
解决方案
GeoPy 的距离函数似乎有一个内在的限制,它似乎只接受标量。
以下解决方法基于@SeaBen 的回答:
df['lat_shift'] = df['latitude'].shift().fillna(df['latitude'])
df['lon_shift'] = df['longitude'].shift().fillna(df['longitude'])
df['alt_shift'] = df['altitude'].shift().fillna(df['altitude'])
df['distances'] = df.apply(lambda x: eukarney(x['latitude'], x['longitude'], x['altitude'], x['lat_shift'], x['lon_shift'], x['alt_shift']), axis=1).fillna(0)
你可以在每一行使用.apply()
,如下:
此处,.apply()
帮助您将标量值逐行传递给自定义函数。因此,使您能够重用设计用于处理标量值的自定义函数。否则,您可能需要修改自定义函数以支持 Pandas.
的向量化数组值
为了迎合 .shift()
条目,一种解决方法是先为它们定义新列,以便我们可以将它们传递给 .apply()
函数。
# Take previous entry by shift and `fillna` with original value for first row entry
# (for in case the custom function cannot handle `NaN` entry on first row after shift)
df['lat_shift'] = df['latitude'].shift().fillna(df['latitude'])
df['lon_shift'] = df['longitude'].shift().fillna(df['longitude'])
df['alt_shift'] = df['altitude'].shift().fillna(df['altitude'])
df['distances'] = df.apply(lambda x: eukarney(x['latitude'], x['longitude'], x['altitude'], x['lat_shift'], x['lon_shift'], x['alt_shift']), axis=1).fillna(0)
我正在尝试将以下函数应用于 Pandas 数据框:
def eukarney(lat1, lon1, alt1, lat2, lon2, alt2):
p1 = (lat1, lon1)
p2 = (lat2, lon2)
karney = distance.distance(p1, p2).m
return np.sqrt(karney**2 + (alt2 - alt1)**2)
如果我使用离散值,例如:
distance = eukarney(49.907611, 5.890404, 339.15734, 49.907683, 5.890373, 339.18224)
但是,如果我尝试将该函数应用于 Pandas 数据框:
df['distances'] = eukarney(df['latitude'], df['longitude'], df['altitude'], df['latitude'].shift(), df['longitude'].shift(), df['altitude'].shift())
这意味着从一行和前一行中获取值。
我收到以下错误消息:
Traceback (most recent call last): File "/home/mirix/Desktop/plage/GPX_invert_sense_change_starting_point_va.py", line 78, in df['distances'] = eukarney(df.loc[:,'latitude':], df.loc[:,'longitude':], df.loc[:,'altitude':], df.loc[:,'latitude':].shift(), df.loc[:,'longitude':].shift(), df.loc[:,'altitude':].shift()) File "/home/mirix/Desktop/plage/GPX_invert_sense_change_starting_point_va.py", line 75, in eukarney karney = distance.distance(p1, p2).m File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py", line 522, in init super().init(*args, **kwargs) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py", line 276, in init kilometers += self.measure(a, b) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/distance.py", line 538, in measure a, b = Point(a), Point(b) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 175, in new return cls.from_sequence(seq) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 472, in from_sequence return cls(*args) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 188, in new _normalize_coordinates(latitude, longitude, altitude) File "/home/mirix/.local/lib/python3.9/site-packages/geopy/point.py", line 57, in _normalize_coordinates latitude = float(latitude or 0.0) File "/home/mirix/.local/lib/python3.9/site-packages/pandas/core/generic.py", line 1534, in nonzero raise ValueError( ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
有趣的是,相同的语法适用于不使用 geopy 库的其他函数。
有什么想法吗?
解决方案
GeoPy 的距离函数似乎有一个内在的限制,它似乎只接受标量。
以下解决方法基于@SeaBen 的回答:
df['lat_shift'] = df['latitude'].shift().fillna(df['latitude'])
df['lon_shift'] = df['longitude'].shift().fillna(df['longitude'])
df['alt_shift'] = df['altitude'].shift().fillna(df['altitude'])
df['distances'] = df.apply(lambda x: eukarney(x['latitude'], x['longitude'], x['altitude'], x['lat_shift'], x['lon_shift'], x['alt_shift']), axis=1).fillna(0)
你可以在每一行使用.apply()
,如下:
此处,.apply()
帮助您将标量值逐行传递给自定义函数。因此,使您能够重用设计用于处理标量值的自定义函数。否则,您可能需要修改自定义函数以支持 Pandas.
为了迎合 .shift()
条目,一种解决方法是先为它们定义新列,以便我们可以将它们传递给 .apply()
函数。
# Take previous entry by shift and `fillna` with original value for first row entry
# (for in case the custom function cannot handle `NaN` entry on first row after shift)
df['lat_shift'] = df['latitude'].shift().fillna(df['latitude'])
df['lon_shift'] = df['longitude'].shift().fillna(df['longitude'])
df['alt_shift'] = df['altitude'].shift().fillna(df['altitude'])
df['distances'] = df.apply(lambda x: eukarney(x['latitude'], x['longitude'], x['altitude'], x['lat_shift'], x['lon_shift'], x['alt_shift']), axis=1).fillna(0)