Pandas 使用类别变量重采样

Pandas resampling with category variable

我想每小时重新采样一个数据框并保持类别变量,我怎样才能有效地做到这一点。我通常使用 df = df.resample('h').sum() 但这不适用于我的类别变量。任何的想法?

date  = ['2015-02-03 23:00:00','2015-02-03 23:30:00','2015-02-04 00:00:00','2015-02-04 00:30:00']
value = [33.24  , 31.71  , 34.39  , 34.49 ]
value2 = [2*x for x in value]
value3 = [3*x for x in value]
cat = ['a','a','b','b']
df = pd.DataFrame({'value':value,'value2':value2,'value3':value3,'index':date,'category':cat})

df.index = pd.to_datetime(df['index'],format='%Y-%m-%d %H:%M')
df.drop(['index'],axis=1,inplace=True)

print(df.head())
                    value  value2  value3    category
index                                     
2015-02-03 23:00:00  33.24   66.48   99.72    a
2015-02-03 23:30:00  31.71   63.42   95.13    a
2015-02-04 00:00:00  34.39   68.78  103.17    b
2015-02-04 00:30:00  34.49   68.98  103.47    b

预期结果:

                     value  value2  value3    category
index                                     
2015-02-03 23:00:00  64.95   129.9   194.85    a
2015-02-04 00:00:00  68.88   137.76  206.64    b

使用 DataFrameGroupBy.resample - 这意味着链 groupbyresample:

df = df.groupby('category').resample('h').sum()
print (df)
                              value  value2  value3
category index                                     
a        2015-02-03 23:00:00  64.95  129.90  194.85
b        2015-02-04 00:00:00  68.88  137.76  206.64

或者可以使用 Grouper:

df = df.groupby(['category', pd.Grouper(freq='h')]).sum()
print (df)
                              value  value2  value3
category index                                     
a        2015-02-03 23:00:00  64.95  129.90  194.85
b        2015-02-04 00:00:00  68.88  137.76  206.64

您的 sum() 聚合对类别没有意义。您必须明确定义分类列所需的聚合。

例如,如果您想选择分类的第一个值,您可以这样做:

df = df.resample('h').apply({"value":"sum", "value2":"sum", "value3":"sum", "category":"first"})
print(df)

                     value  value2  value3 category
index                                              
2015-02-03 23:00:00  64.95  129.90  194.85        a
2015-02-04 00:00:00  68.88  137.76  206.64        b