将列添加到 groupby 数据框

Adding a column to a groupby dataframe

如何将 'Sum' 列添加到 panda groupby 数据框? 我想在下面的 groupby 数据框的 'Bearish' 和 'Bullish' 内列上做一个 'Sum'。

然后我想添加另外两列:

%看跌 = Bearish/Sum*100

%看涨 = Bullish/Sum*100

group_df = df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).count()
group_df = group_df.unstack()

                    message        
sentiment           Bearish Bullish
created                            
2017-08-01 23:00:00     2.0     2.0
2017-08-02 00:00:00     1.0     3.0
2017-08-02 01:00:00     NaN     4.0

您可以将 concat 与新的 DataFrame 一起使用:

idx = pd.date_range('2017-08-01 23:13:00', periods=12, freq='12T')
df = pd.DataFrame({'message':[1,1,2,2,2,2,2,2,3,3,3,3],
                   'sentiment':['Bearish'] * 5 + ['Bullish'] * 7 }, index=idx)
print (df)
                     message sentiment
2017-08-01 23:13:00        1   Bearish
2017-08-01 23:25:00        1   Bearish
2017-08-01 23:37:00        2   Bearish
2017-08-01 23:49:00        2   Bearish
2017-08-02 00:01:00        2   Bearish
2017-08-02 00:13:00        2   Bullish
2017-08-02 00:25:00        2   Bullish
2017-08-02 00:37:00        2   Bullish
2017-08-02 00:49:00        3   Bullish
2017-08-02 01:01:00        3   Bullish
2017-08-02 01:13:00        3   Bullish
2017-08-02 01:25:00        3   Bullish

group_df =df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).count()
#add ['message'] for remove Multiindex in columns
group_df = group_df['message'].unstack()

#divide by sum
#add prefix -  
df1 = group_df.div(group_df.sum()).mul(100).add_prefix('%%')
print (df1)
                     %Bearish   %Bullish
2017-08-01 23:00:00      80.0        NaN
2017-08-02 00:00:00      20.0  57.142857
2017-08-02 01:00:00       NaN  42.857143

df = pd.concat([group_df, df1], axis=1)
print (df)
                     Bearish  Bullish  %Bearish   %Bullish
2017-08-01 23:00:00      4.0      NaN      80.0        NaN
2017-08-02 00:00:00      1.0      4.0      20.0  57.142857
2017-08-02 01:00:00      NaN      3.0       NaN  42.857143

如果需要GroupBy.size:

group_df = df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).size()
group_df = group_df.unstack()

df1 = group_df.div(group_df.sum()).mul(100).add_prefix('%%')
print (df1)
                     %Bearish   %Bullish
2017-08-01 23:00:00      80.0        NaN
2017-08-02 00:00:00      20.0  57.142857
2017-08-02 01:00:00       NaN  42.857143

df = pd.concat([group_df, df1], axis=1)
print (df)
                     Bearish  Bullish  %Bearish   %Bullish
2017-08-01 23:00:00      4.0      NaN      80.0        NaN
2017-08-02 00:00:00      1.0      4.0      20.0  57.142857
2017-08-02 01:00:00      NaN      3.0       NaN  42.857143