pandas 中的 groupby 函数后有没有办法得到多个平均值?
Is there a way to get multiple averages after groupby function in pandas?
我有以下 pandas 时间序列数据框:
Index Time Centre position X Centre position Y Datafile Group Zone Timeframe dV
8789 1257.318 180.0 201.0 CHR1 CHR Zone A Before stimulation 15.625000
8790 1257.462 181.0 195.0 CHR1 CHR Zone A Before stimulation 42.241406
8791 1257.590 184.0 188.0 CHR1 CHR Zone A Before stimulation 59.498227
8792 1257.718 187.0 184.0 CHR1 CHR Zone B Before stimulation 39.062500
8793 1257.862 190.0 176.0 CHR1 CHR Zone B Before stimulation 59.333359
8794 1257.927 190.0 173.0 CHR1 CHR Zone A Before stimulation 46.153846
8795 1258.054 192.0 171.0 CHR1 CHR Zone A Before stimulation 22.271080
8796 1258.198 192.0 172.0 CHR1 CHR Zone C After stimulation 6.944444
8797 1258.326 192.0 171.0 CHR1 CHR Zone C After stimulation 7.812500
8798 1258.454 191.0 169.0 CHR1 CHR Zone A After stimulation 17.469281
8799 1258.598 191.0 168.0 CHR1 CHR Zone A After stimulation 6.944444
8800 1258.726 192.0 165.0 CHR1 CHR Zone A After stimulation 24.705294
我想提取按 Timeframe
和 Zone
分组的平均速度 (dV),但是,由于数据是连续的,我想获得给定时间范围内的多个平均值,并且区。我想不出一个优雅的方法来实现这一点,因为 groupby 平均所有值并输出一个值。
预期输出为:
非常感谢您!
您需要做的第一件事是创建一个引用列。
一个非常幼稚的方式就像
df.loc[:,'Zone_shift']=df.loc[:,'Zone'].shift(1)
df.loc[:,'Timeframe_shift']=df.loc[:,'Timeframe'].shift(1)
df.loc[:,'Groupby'] = df.apply(lambda x: 0 if x['Zone']==x['Zone_shift'] and x['Timeframe']==x['Timeframe_shift'] else 1, axis=1)
df.loc[:,'Groupby'] = df.loc[:,'Groupby'].cumsum()
添加参考数据后,dataframe是这样的
Zone Timeframe dV Zone_shift Timeframe_shift Groupby
0 ZoneA Beforestimulation 15.625 nan nan 1
1 ZoneA Beforestimulation 42.241 ZoneA Beforestimulation 1
2 ZoneA Beforestimulation 59.498 ZoneA Beforestimulation 1
3 ZoneB Beforestimulation 39.062 ZoneA Beforestimulation 2
4 ZoneB Beforestimulation 59.333 ZoneB Beforestimulation 2
5 ZoneA Beforestimulation 46.153 ZoneB Beforestimulation 3
6 ZoneA Beforestimulation 22.271 ZoneA Beforestimulation 3
7 ZoneC Afterstimulation 6.9444 ZoneA Beforestimulation 4
8 ZoneC Afterstimulation 7.8125 ZoneC Afterstimulation 4
9 ZoneA Afterstimulation 17.469 ZoneC Afterstimulation 5
10 ZoneA Afterstimulation 6.9444 ZoneA Afterstimulation 5
11 ZoneA Afterstimulation 24.705 ZoneA Afterstimulation 5
那你只需要groupby
df.groupby(['Groupby','Zone','Timeframe']).mean()
最终输出会像
Groupby Zone Timeframe dV
1 ZoneA Beforestimulation 39.12154433333333
2 ZoneB Beforestimulation 49.1979295
3 ZoneA Beforestimulation 34.212463
4 ZoneC Afterstimulation 7.378472
5 ZoneA Afterstimulation 16.373006333333333
我有以下 pandas 时间序列数据框:
Index Time Centre position X Centre position Y Datafile Group Zone Timeframe dV
8789 1257.318 180.0 201.0 CHR1 CHR Zone A Before stimulation 15.625000
8790 1257.462 181.0 195.0 CHR1 CHR Zone A Before stimulation 42.241406
8791 1257.590 184.0 188.0 CHR1 CHR Zone A Before stimulation 59.498227
8792 1257.718 187.0 184.0 CHR1 CHR Zone B Before stimulation 39.062500
8793 1257.862 190.0 176.0 CHR1 CHR Zone B Before stimulation 59.333359
8794 1257.927 190.0 173.0 CHR1 CHR Zone A Before stimulation 46.153846
8795 1258.054 192.0 171.0 CHR1 CHR Zone A Before stimulation 22.271080
8796 1258.198 192.0 172.0 CHR1 CHR Zone C After stimulation 6.944444
8797 1258.326 192.0 171.0 CHR1 CHR Zone C After stimulation 7.812500
8798 1258.454 191.0 169.0 CHR1 CHR Zone A After stimulation 17.469281
8799 1258.598 191.0 168.0 CHR1 CHR Zone A After stimulation 6.944444
8800 1258.726 192.0 165.0 CHR1 CHR Zone A After stimulation 24.705294
我想提取按 Timeframe
和 Zone
分组的平均速度 (dV),但是,由于数据是连续的,我想获得给定时间范围内的多个平均值,并且区。我想不出一个优雅的方法来实现这一点,因为 groupby 平均所有值并输出一个值。
预期输出为:
非常感谢您!
您需要做的第一件事是创建一个引用列。 一个非常幼稚的方式就像
df.loc[:,'Zone_shift']=df.loc[:,'Zone'].shift(1)
df.loc[:,'Timeframe_shift']=df.loc[:,'Timeframe'].shift(1)
df.loc[:,'Groupby'] = df.apply(lambda x: 0 if x['Zone']==x['Zone_shift'] and x['Timeframe']==x['Timeframe_shift'] else 1, axis=1)
df.loc[:,'Groupby'] = df.loc[:,'Groupby'].cumsum()
添加参考数据后,dataframe是这样的
Zone Timeframe dV Zone_shift Timeframe_shift Groupby
0 ZoneA Beforestimulation 15.625 nan nan 1
1 ZoneA Beforestimulation 42.241 ZoneA Beforestimulation 1
2 ZoneA Beforestimulation 59.498 ZoneA Beforestimulation 1
3 ZoneB Beforestimulation 39.062 ZoneA Beforestimulation 2
4 ZoneB Beforestimulation 59.333 ZoneB Beforestimulation 2
5 ZoneA Beforestimulation 46.153 ZoneB Beforestimulation 3
6 ZoneA Beforestimulation 22.271 ZoneA Beforestimulation 3
7 ZoneC Afterstimulation 6.9444 ZoneA Beforestimulation 4
8 ZoneC Afterstimulation 7.8125 ZoneC Afterstimulation 4
9 ZoneA Afterstimulation 17.469 ZoneC Afterstimulation 5
10 ZoneA Afterstimulation 6.9444 ZoneA Afterstimulation 5
11 ZoneA Afterstimulation 24.705 ZoneA Afterstimulation 5
那你只需要groupby
df.groupby(['Groupby','Zone','Timeframe']).mean()
最终输出会像
Groupby Zone Timeframe dV
1 ZoneA Beforestimulation 39.12154433333333
2 ZoneB Beforestimulation 49.1979295
3 ZoneA Beforestimulation 34.212463
4 ZoneC Afterstimulation 7.378472
5 ZoneA Afterstimulation 16.373006333333333