pandas dataframe.describe() 根据索引值获取聚合

Question

我正在尝试在 df1 上使用 .describe() 方法来获取聚合。当前index为year。我想根据索引中 3 年期间的每个 statistics 获取这些统计数据。我尝试使用 stats_df = df1.groupby('statistics').descirbe().unstack(1))，但没有得到想要的结果。

在df1 =

         statistics  s_values
year
1999  cigarette use       100
1999  cellphone use       310
1999   internet use       101
1999    alcohol use       100
1999       soda use       215
2000  cigarette use       315
2000  cellphone use       317
2000   internet use       325
2000    alcohol use       108
2000       soda use       200
2001  cigarette use       122
2001  cellphone use       311
2001   internet use       112
2001    alcohol use       144
2001       soda use       689
2002  cigarette use       813
2002  cellphone use       954
2002   internet use       548
2002    alcohol use       882
2002       soda use       121

我正在尝试实现这样的输出。请记住，这些聚合值并不准确，我只是用随机数填充它们，让您了解格式。结果 stats_df =

     statistics  count  unique  top  freq  mean  std  min  20%  40%  50%  60%  80%  max

  cigarette use    32    335    655    54   45    45   1    23   21   12   55   55  999
  cellphone use    92    131    895    49   12    33   6    13   32   55   34   12  933     
  internet use     32    111    123    44   65    31   2    42  544   15   11   54  111    
   alcohol use     32    315    611    33   41    53   3    34   22   34   11   33  555      
     soda use      32    355    655    54   45    45   1    23   21   12   55   55  999

谢谢

Answer 1

我创建了一个示例数据框，只需使用 groupby().describe() 即可获得结果。我不确定您的代码有什么问题，您是否也可以编辑 post 以显示您获得的结果？

这是我的

df = pd.DataFrame(index=[1999,1999,1999,1999,1999,2000,2000,2000,2000,2000], columns=['statistics', 's_values'], data=[['cigarette use', 100],['cellphone use', 310],['internet use',
 101],['alcohol use', 100], ['soda use', 215],['cigarette use', 315],['cellphone use', 317],['internet use', 325],['alcohol use', 108],['soda use', 200]])

df.groupby("statistics").describe()

输出：

                 s_values
                 count   mean         std    min     25%    50%     75%    max
statistics
alcohol use        2.0  104.0    5.656854  100.0  102.00  104.0  106.00  108.0
cellphone use      2.0  313.5    4.949747  310.0  311.75  313.5  315.25  317.0
cigarette use      2.0  207.5  152.027958  100.0  153.75  207.5  261.25  315.0
internet use       2.0  213.0  158.391919  101.0  157.00  213.0  269.00  325.0
soda use           2.0  207.5   10.606602  200.0  203.75  207.5  211.25  215.0

pandas dataframe.describe() 根据索引值获取聚合

pandas dataframe.describe() obtain aggregates based on index values

python

describe

aggregates

pandas