如何在数据框上使用 groupby

Question

我有一个数据框（调查），我需要在其中对 2 列进行分组。两列之一是排名（5 个选项：非常差、差、一般、好和优秀），第二列是时间列表。我需要像这样对这两列进行分组：

raking    |   Time   |  Count of how many times the time appears on the column "time" for a raking  
-------------------------------------
Very poor |  0.0     |   6
          |  1.0     |   2    
          |  2.0     |   9             
-------------------------------------                              
Poor      |  0.0     |   3                           
          |  1.0     |   12                          
...

我需要在 5 个图表中显示这些 table 的结果（每次耙一个），其中 x=Time 和 Y=Count

我已经卡了几个小时了，有人可以帮忙吗？？？

Answer 1

设置一个MRE:

rank = ['Very Poor', 'Poor', 'Average', 'Good', 'Excellent']
df = pd.DataFrame({'Ranking':  np.random.choice(rank, 100),
                   'Time': np.random.randint(1, 50, 100)})
print(df)

# Output:
      Ranking  Time
0   Excellent    28
1        Poor    33
2   Excellent    28
3     Average    22
4   Very Poor    11
..        ...   ...
95  Very Poor    13
96    Average    26
97  Very Poor    23
98       Good    24
99       Good    36

[100 rows x 2 columns]

使用value_counts来统计（排名，时间）而不是groupby：

count = df.value_counts(['Ranking', 'Time']).rename('Count').reset_index()
print(count)

# Output:
      Ranking  Time  Count
0        Poor    41      3
1   Very Poor    46      3
2   Very Poor    49      2
3   Very Poor    17      2
4   Excellent    20      2
..        ...   ...    ...
81  Excellent    34      1
82  Excellent    32      1
83  Excellent    27      1
84  Excellent    26      1
85       Good    32      1

[86 rows x 3 columns]

要可视化数据，最简单的方法是使用 seaborn and displot:

# Python env: pip install seaborn
# Anaconda env: conda install seaborn
import seaborn as sns
import matplotlib.pyplot as plt

sns.displot(df, x='Time', col='Ranking', binwidth=1)
plt.show()

如何在数据框上使用 groupby

How to use groupby on a dataframe

python

graph

dataframe

pandas