如何根据上面行的值迭代每一行?
How to iterate each row according to values of row above?
假设传感器连接到 3 名攀爬结构的登山者,并且这些传感器随机捕获特定测量值。数据抓取到下面的数据框中(数据框比这个长很多):
df = pd.DataFrame({
'Name': ['Cody', 'Dustin', 'Dustin', 'Cody', 'Ryan', 'Dustin', 'Ryan', 'Cody'],
'Timestamp': ['08:10:23', '08:12:58', '08:15:02', '08:19:43', '08:21:00', '08:30:17', '08:34:01', '08:34:59'],
'Category': ['Body Temp', 'Altitude', 'Heart Rate', 'Body Temp', 'Heart Rate', 'Heart Rate', 'Altitude', 'Altitude'],
'Body Temp': [35.9, np.nan, np.nan, 36.2, np.nan, np.nan, np.nan, np.nan],
'Altitude': [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 12, 6],
'Heart Rate': [np.nan, np.nan, 75, np.nan, 71, 69, np.nan, np.nan]
})
Name Timestamp Category Body Temp Altitude Heart Rate
0 Cody 08:10:23 Body Temp 35.9 NaN NaN
1 Dustin 08:12:58 Altitude NaN 7.0 NaN
2 Dustin 08:15:02 Heart Rate NaN NaN 75.0
3 Cody 08:19:43 Body Temp 36.2 NaN NaN
4 Ryan 08:21:00 Heart Rate NaN NaN 71.0
5 Dustin 08:30:17 Heart Rate NaN NaN 69.0
6 Ryan 08:34:01 Altitude NaN 12.0 NaN
7 Cody 08:34:59 Altitude NaN 6.0 NaN
目的是根据每个登山者和时间戳不断更新每一行的测量值,这样每个登山者的每个后续行都会更新他们的测量值。
所以结果应该是这样的:
Name Timestamp Category Body Temp Altitude Heart Rate
0 Cody 08:10:23 Body Temp 35.9 NaN NaN
1 Dustin 08:12:58 Altitude NaN 7.0 NaN
2 Dustin 08:15:02 Heart Rate NaN 7.0 75.0
3 Cody 08:19:43 Body Temp 36.2 NaN NaN
4 Ryan 08:21:00 Heart Rate NaN NaN 71.0
5 Dustin 08:30:17 Heart Rate NaN 7.0 69.0
6 Ryan 08:34:01 Altitude NaN 12.0 71.0
7 Cody 08:34:59 Altitude 36.2 6.0 NaN
到目前为止,我已经考虑过使用.sort_value()
将登山者分开并从那里开始工作。但是我很难弄清楚如何不断更新每一行。为此需要函数或 iterrows 吗?
如果每个登山者在该测量值中都存在这样的值,那么这项工作基本上似乎是用以前的值填充缺失值,所以 groupby.ffill
应该完成这项工作:
out = df[['Name']].join(df.groupby('Name').ffill())
输出:
Name Timestamp Category Body Temp Altitude Heart Rate
0 Cody 08:10:23 Body Temp 35.9 NaN NaN
1 Dustin 08:12:58 Altitude NaN 7.0 NaN
2 Dustin 08:15:02 Heart Rate NaN 7.0 75.0
3 Cody 08:19:43 Body Temp 36.2 NaN NaN
4 Ryan 08:21:00 Heart Rate NaN NaN 71.0
5 Dustin 08:30:17 Heart Rate NaN 7.0 69.0
6 Ryan 08:34:01 Altitude NaN 12.0 71.0
7 Cody 08:34:59 Altitude 36.2 6.0 NaN
假设传感器连接到 3 名攀爬结构的登山者,并且这些传感器随机捕获特定测量值。数据抓取到下面的数据框中(数据框比这个长很多):
df = pd.DataFrame({
'Name': ['Cody', 'Dustin', 'Dustin', 'Cody', 'Ryan', 'Dustin', 'Ryan', 'Cody'],
'Timestamp': ['08:10:23', '08:12:58', '08:15:02', '08:19:43', '08:21:00', '08:30:17', '08:34:01', '08:34:59'],
'Category': ['Body Temp', 'Altitude', 'Heart Rate', 'Body Temp', 'Heart Rate', 'Heart Rate', 'Altitude', 'Altitude'],
'Body Temp': [35.9, np.nan, np.nan, 36.2, np.nan, np.nan, np.nan, np.nan],
'Altitude': [np.nan, 7, np.nan, np.nan, np.nan, np.nan, 12, 6],
'Heart Rate': [np.nan, np.nan, 75, np.nan, 71, 69, np.nan, np.nan]
})
Name Timestamp Category Body Temp Altitude Heart Rate
0 Cody 08:10:23 Body Temp 35.9 NaN NaN
1 Dustin 08:12:58 Altitude NaN 7.0 NaN
2 Dustin 08:15:02 Heart Rate NaN NaN 75.0
3 Cody 08:19:43 Body Temp 36.2 NaN NaN
4 Ryan 08:21:00 Heart Rate NaN NaN 71.0
5 Dustin 08:30:17 Heart Rate NaN NaN 69.0
6 Ryan 08:34:01 Altitude NaN 12.0 NaN
7 Cody 08:34:59 Altitude NaN 6.0 NaN
目的是根据每个登山者和时间戳不断更新每一行的测量值,这样每个登山者的每个后续行都会更新他们的测量值。
所以结果应该是这样的:
Name Timestamp Category Body Temp Altitude Heart Rate
0 Cody 08:10:23 Body Temp 35.9 NaN NaN
1 Dustin 08:12:58 Altitude NaN 7.0 NaN
2 Dustin 08:15:02 Heart Rate NaN 7.0 75.0
3 Cody 08:19:43 Body Temp 36.2 NaN NaN
4 Ryan 08:21:00 Heart Rate NaN NaN 71.0
5 Dustin 08:30:17 Heart Rate NaN 7.0 69.0
6 Ryan 08:34:01 Altitude NaN 12.0 71.0
7 Cody 08:34:59 Altitude 36.2 6.0 NaN
到目前为止,我已经考虑过使用.sort_value()
将登山者分开并从那里开始工作。但是我很难弄清楚如何不断更新每一行。为此需要函数或 iterrows 吗?
如果每个登山者在该测量值中都存在这样的值,那么这项工作基本上似乎是用以前的值填充缺失值,所以 groupby.ffill
应该完成这项工作:
out = df[['Name']].join(df.groupby('Name').ffill())
输出:
Name Timestamp Category Body Temp Altitude Heart Rate
0 Cody 08:10:23 Body Temp 35.9 NaN NaN
1 Dustin 08:12:58 Altitude NaN 7.0 NaN
2 Dustin 08:15:02 Heart Rate NaN 7.0 75.0
3 Cody 08:19:43 Body Temp 36.2 NaN NaN
4 Ryan 08:21:00 Heart Rate NaN NaN 71.0
5 Dustin 08:30:17 Heart Rate NaN 7.0 69.0
6 Ryan 08:34:01 Altitude NaN 12.0 71.0
7 Cody 08:34:59 Altitude 36.2 6.0 NaN