传播然后减去 pandas 数据框中的连续行
spread then subtract consecutive rows in pandas data frame
我有一个数据框,我需要将同一天的连续事件配对,然后从前者中减去后者。每个都有时间戳和日期。
time date event score
0 2022-03-07 06:45:00+00:00 2022-03-07 light 80.066667
1 2022-03-07 18:12:00+00:00 2022-03-07 dark 79.857667
2 2022-03-30 06:25:00+00:00 2022-03-30 light 107.060833
3 2022-03-30 13:38:00+00:00 2022-03-30 dark 105.324000
4 2022-03-30 13:40:00+00:00 2022-03-30 dark 105.239750
5 2022-03-30 15:47:00+00:00 2022-03-30 light 106.863143
6 2022-04-01 06:25:00+00:00 2022-04-01 light 101.271867
我尝试使用
传播数据帧
df = df.pivot(index='time', columns='event', values='score')
event light dark
time
2022-03-07 06:45:00+00:00 80.066667 NaN
2022-03-07 18:12:00+00:00 NaN 79.857667
2022-03-30 06:25:00+00:00 107.060833 NaN
2022-03-30 13:38:00+00:00 NaN 105.324000
2022-03-30 13:40:00+00:00 NaN 105.239750
2022-03-30 15:47:00+00:00 106.863143 NaN
2022-04-01 06:25:00+00:00 101.271867 NaN
然而,由于事件发生在不同的时间,传播的数据帧有 NaN。理想情况下,我会这样结束,我保留这对中第一次出现的时间(亮或暗),对齐事件(注意:相应的暗与 2022-04-01 尚未发生的亮相匹配)以及何时先亮我先减后值,先暗后我减前值
event light dark diff
time
2022-03-07 06:45:00+00:00 80.066667 79.857667 -0.208999
2022-03-30 06:25:00+00:00 107.060833 105.324000 -1.7368
2022-03-30 13:40:00+00:00 106.863143 105.239750 -1.6233
2022-04-01 06:25:00+00:00 101.271867 NaN NaN
这是使用 groupby
+ cumcount
创建组的一种方法,然后在 groupby
+ first
中使用该组来获取每个事件每天第一次发生的时间.然后pivot
.
最后,使用 diff
得到“light”和“dark”之间的差异,assign
df
中“diff”列的差异:
out = (df.assign(time=df.groupby(df.groupby('event').cumcount())['time'].transform('first'))
.pivot('time', 'event', 'score').reset_index().rename_axis([None], axis=1)
.assign(diff=lambda x: x['dark']-x['light']))
输出:
time dark light diff
0 2022-03-07 06:45:00+00:00 79.857667 80.066667 -0.209000
1 2022-03-30 06:25:00+00:00 105.324000 107.060833 -1.736833
2 2022-03-30 13:40:00+00:00 105.239750 106.863143 -1.623393
3 2022-04-01 06:25:00+00:00 NaN 101.271867 NaN
我有一个数据框,我需要将同一天的连续事件配对,然后从前者中减去后者。每个都有时间戳和日期。
time date event score
0 2022-03-07 06:45:00+00:00 2022-03-07 light 80.066667
1 2022-03-07 18:12:00+00:00 2022-03-07 dark 79.857667
2 2022-03-30 06:25:00+00:00 2022-03-30 light 107.060833
3 2022-03-30 13:38:00+00:00 2022-03-30 dark 105.324000
4 2022-03-30 13:40:00+00:00 2022-03-30 dark 105.239750
5 2022-03-30 15:47:00+00:00 2022-03-30 light 106.863143
6 2022-04-01 06:25:00+00:00 2022-04-01 light 101.271867
我尝试使用
传播数据帧df = df.pivot(index='time', columns='event', values='score')
event light dark
time
2022-03-07 06:45:00+00:00 80.066667 NaN
2022-03-07 18:12:00+00:00 NaN 79.857667
2022-03-30 06:25:00+00:00 107.060833 NaN
2022-03-30 13:38:00+00:00 NaN 105.324000
2022-03-30 13:40:00+00:00 NaN 105.239750
2022-03-30 15:47:00+00:00 106.863143 NaN
2022-04-01 06:25:00+00:00 101.271867 NaN
然而,由于事件发生在不同的时间,传播的数据帧有 NaN。理想情况下,我会这样结束,我保留这对中第一次出现的时间(亮或暗),对齐事件(注意:相应的暗与 2022-04-01 尚未发生的亮相匹配)以及何时先亮我先减后值,先暗后我减前值
event light dark diff
time
2022-03-07 06:45:00+00:00 80.066667 79.857667 -0.208999
2022-03-30 06:25:00+00:00 107.060833 105.324000 -1.7368
2022-03-30 13:40:00+00:00 106.863143 105.239750 -1.6233
2022-04-01 06:25:00+00:00 101.271867 NaN NaN
这是使用 groupby
+ cumcount
创建组的一种方法,然后在 groupby
+ first
中使用该组来获取每个事件每天第一次发生的时间.然后pivot
.
最后,使用 diff
得到“light”和“dark”之间的差异,assign
df
中“diff”列的差异:
out = (df.assign(time=df.groupby(df.groupby('event').cumcount())['time'].transform('first'))
.pivot('time', 'event', 'score').reset_index().rename_axis([None], axis=1)
.assign(diff=lambda x: x['dark']-x['light']))
输出:
time dark light diff
0 2022-03-07 06:45:00+00:00 79.857667 80.066667 -0.209000
1 2022-03-30 06:25:00+00:00 105.324000 107.060833 -1.736833
2 2022-03-30 13:40:00+00:00 105.239750 106.863143 -1.623393
3 2022-04-01 06:25:00+00:00 NaN 101.271867 NaN