如何根据上面行的值迭代每一行?

How to iterate each row according to values of row above?

假设传感器连接到 3 名攀爬结构的登山者,并且这些传感器随机捕获特定测量值。数据抓取到下面的数据框中(数据框比这个长很多):

df = pd.DataFrame({
'Name': ['Cody', 'Dustin', 'Dustin', 'Cody', 'Ryan', 'Dustin', 'Ryan', 'Cody'],
'Timestamp': ['08:10:23', '08:12:58', '08:15:02', '08:19:43', '08:21:00', '08:30:17', '08:34:01', '08:34:59'],
'Category': ['Body Temp', 'Altitude', 'Heart Rate', 'Body Temp', 'Heart Rate', 'Heart Rate', 'Altitude', 'Altitude'],
'Body Temp': [35.9, np.nan, np.nan, 36.2, np.nan, np.nan, np.nan, np.nan],
'Altitude': [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 12, 6],
'Heart Rate': [np.nan, np.nan, 75, np.nan, 71, 69, np.nan, np.nan]
})

     Name Timestamp    Category  Body Temp  Altitude  Heart Rate
0    Cody  08:10:23   Body Temp       35.9       NaN         NaN
1  Dustin  08:12:58    Altitude        NaN       7.0         NaN
2  Dustin  08:15:02  Heart Rate        NaN       NaN        75.0
3    Cody  08:19:43   Body Temp       36.2       NaN         NaN
4    Ryan  08:21:00  Heart Rate        NaN       NaN        71.0
5  Dustin  08:30:17  Heart Rate        NaN       NaN        69.0
6    Ryan  08:34:01    Altitude        NaN      12.0         NaN
7    Cody  08:34:59    Altitude        NaN       6.0         NaN

目的是根据每个登山者和时间戳不断更新每一行的测量值,这样每个登山者的每个后续行都会更新他们的测量值。

所以结果应该是这样的:

     Name Timestamp    Category  Body Temp  Altitude  Heart Rate
0    Cody  08:10:23   Body Temp       35.9       NaN         NaN
1  Dustin  08:12:58    Altitude        NaN       7.0         NaN
2  Dustin  08:15:02  Heart Rate        NaN       7.0        75.0
3    Cody  08:19:43   Body Temp       36.2       NaN         NaN
4    Ryan  08:21:00  Heart Rate        NaN       NaN        71.0
5  Dustin  08:30:17  Heart Rate        NaN       7.0        69.0
6    Ryan  08:34:01    Altitude        NaN      12.0        71.0
7    Cody  08:34:59    Altitude       36.2       6.0         NaN

到目前为止,我已经考虑过使用.sort_value()将登山者分开并从那里开始工作。但是我很难弄清楚如何不断更新每一行。为此需要函数或 iterrows 吗?

如果每个登山者在该测量值中都存在这样的值,那么这项工作基本上似乎是用以前的值填充缺失值,所以 groupby.ffill 应该完成这项工作:

out = df[['Name']].join(df.groupby('Name').ffill())

输出:

     Name Timestamp    Category  Body Temp  Altitude  Heart Rate
0    Cody  08:10:23   Body Temp       35.9       NaN         NaN
1  Dustin  08:12:58    Altitude        NaN       7.0         NaN
2  Dustin  08:15:02  Heart Rate        NaN       7.0        75.0
3    Cody  08:19:43   Body Temp       36.2       NaN         NaN
4    Ryan  08:21:00  Heart Rate        NaN       NaN        71.0
5  Dustin  08:30:17  Heart Rate        NaN       7.0        69.0
6    Ryan  08:34:01    Altitude        NaN      12.0        71.0
7    Cody  08:34:59    Altitude       36.2       6.0         NaN