如何在 pandas 中获取 "detailed" describe()

How to get a "detailed" describe() in pandas

所以,我收到了一个问题,我将在此处复制 2 列:

Range       Answer
>30          maybe
>30          yes
<30          no
<30          yes
>30          maybe
<30          yes

所以我需要做的是按范围分组并知道每个选项得到多少个答案,在这种情况下:

Range       Answer
<30          
             no: 1
             yes:2
             maybe:0
>30          
             no: 0
             yes:1
             maybe:2

实际上,并没有两列,而是很多列,我需要对其中一列进行分组,然后为数据框中的每一列获取此类统计信息。这是我第一次使用分类数据,我很迷茫。我使用了 describe() 并且它适用于最常见的答案,但我需要它用于每个答案,是否有像 "detailed desceibe()"?

这样的直接方法

使用crosstab

的一种方式
In [685]: pd.crosstab(df.Range, df.Answer).stack()
Out[685]:
Range  Answer
<30    maybe     0
       no        1
       yes       2
>30    maybe     2
       no        0
       yes       1
dtype: int64

或者,groupby

In [690]: df.groupby(['Range', 'Answer']).size().unstack(fill_value=0).stack()
Out[690]:
Range  Answer
<30    maybe     0
       no        1
       yes       2
>30    maybe     2
       no        0
       yes       1
dtype: int64

您可以使用 melt for reshape with aggregatesize :

print (df)
  Range Answer1 Answer2 Answer3
0   >30   maybe      no     yes
1   >30     yes     yes      no
2   <30      no     yes      no
3   <30     yes   maybe      no
4   >30   maybe      no     yes
5   <30     yes      no      no

print (df.melt('Range', var_name='Answers', value_name='Vals'))
   Range  Answers   Vals
0    >30  Answer1  maybe
1    >30  Answer1    yes
2    <30  Answer1     no
3    <30  Answer1    yes
4    >30  Answer1  maybe
5    <30  Answer1    yes
6    >30  Answer2     no
7    >30  Answer2    yes
8    <30  Answer2    yes
9    <30  Answer2  maybe
10   >30  Answer2     no
11   <30  Answer2     no
12   >30  Answer3    yes
13   >30  Answer3     no
14   <30  Answer3     no
15   <30  Answer3     no
16   >30  Answer3    yes
17   <30  Answer3     no

df1 = df.melt('Range', var_name='Answers', value_name='Vals') \
        .groupby(['Range', 'Answers', 'Vals']).size()
print (df1)
Range  Answers  Vals 
<30    Answer1  no       1
                yes      2
       Answer2  maybe    1
                no       1
                yes      1
       Answer3  no       3
>30    Answer1  maybe    2
                yes      1
       Answer2  no       2
                yes      1
       Answer3  no       1
                yes      2
dtype: int64

另一个解决方案是使用 stack for reshape and use value_counts:

df1 = df.set_index('Range').stack() \
        .groupby(level=[0,1]).value_counts()
print (df1)
Range                
<30    Answer1  yes      2
                no       1
       Answer2  maybe    1
                no       1
                yes      1
       Answer3  no       3
>30    Answer1  maybe    2
                yes      1
       Answer2  no       2
                yes      1
       Answer3  yes      2
                no       1
dtype: int64