如何在数据框上使用 groupby
How to use groupby on a dataframe
我有一个数据框(调查),我需要在其中对 2 列进行分组。
两列之一是排名(5 个选项:非常差、差、一般、好和优秀),第二列是时间列表。
我需要像这样对这两列进行分组:
raking | Time | Count of how many times the time appears on the column "time" for a raking
-------------------------------------
Very poor | 0.0 | 6
| 1.0 | 2
| 2.0 | 9
-------------------------------------
Poor | 0.0 | 3
| 1.0 | 12
...
我需要在 5 个图表中显示这些 table 的结果(每次耙一个),其中 x=Time 和 Y=Count
我已经卡了几个小时了,有人可以帮忙吗???
设置一个MRE:
rank = ['Very Poor', 'Poor', 'Average', 'Good', 'Excellent']
df = pd.DataFrame({'Ranking': np.random.choice(rank, 100),
'Time': np.random.randint(1, 50, 100)})
print(df)
# Output:
Ranking Time
0 Excellent 28
1 Poor 33
2 Excellent 28
3 Average 22
4 Very Poor 11
.. ... ...
95 Very Poor 13
96 Average 26
97 Very Poor 23
98 Good 24
99 Good 36
[100 rows x 2 columns]
使用value_counts
来统计(排名,时间)而不是groupby
:
count = df.value_counts(['Ranking', 'Time']).rename('Count').reset_index()
print(count)
# Output:
Ranking Time Count
0 Poor 41 3
1 Very Poor 46 3
2 Very Poor 49 2
3 Very Poor 17 2
4 Excellent 20 2
.. ... ... ...
81 Excellent 34 1
82 Excellent 32 1
83 Excellent 27 1
84 Excellent 26 1
85 Good 32 1
[86 rows x 3 columns]
要可视化数据,最简单的方法是使用 seaborn
and displot
:
# Python env: pip install seaborn
# Anaconda env: conda install seaborn
import seaborn as sns
import matplotlib.pyplot as plt
sns.displot(df, x='Time', col='Ranking', binwidth=1)
plt.show()
我有一个数据框(调查),我需要在其中对 2 列进行分组。 两列之一是排名(5 个选项:非常差、差、一般、好和优秀),第二列是时间列表。 我需要像这样对这两列进行分组:
raking | Time | Count of how many times the time appears on the column "time" for a raking
-------------------------------------
Very poor | 0.0 | 6
| 1.0 | 2
| 2.0 | 9
-------------------------------------
Poor | 0.0 | 3
| 1.0 | 12
...
我需要在 5 个图表中显示这些 table 的结果(每次耙一个),其中 x=Time 和 Y=Count
我已经卡了几个小时了,有人可以帮忙吗???
设置一个MRE:
rank = ['Very Poor', 'Poor', 'Average', 'Good', 'Excellent']
df = pd.DataFrame({'Ranking': np.random.choice(rank, 100),
'Time': np.random.randint(1, 50, 100)})
print(df)
# Output:
Ranking Time
0 Excellent 28
1 Poor 33
2 Excellent 28
3 Average 22
4 Very Poor 11
.. ... ...
95 Very Poor 13
96 Average 26
97 Very Poor 23
98 Good 24
99 Good 36
[100 rows x 2 columns]
使用value_counts
来统计(排名,时间)而不是groupby
:
count = df.value_counts(['Ranking', 'Time']).rename('Count').reset_index()
print(count)
# Output:
Ranking Time Count
0 Poor 41 3
1 Very Poor 46 3
2 Very Poor 49 2
3 Very Poor 17 2
4 Excellent 20 2
.. ... ... ...
81 Excellent 34 1
82 Excellent 32 1
83 Excellent 27 1
84 Excellent 26 1
85 Good 32 1
[86 rows x 3 columns]
要可视化数据,最简单的方法是使用 seaborn
and displot
:
# Python env: pip install seaborn
# Anaconda env: conda install seaborn
import seaborn as sns
import matplotlib.pyplot as plt
sns.displot(df, x='Time', col='Ranking', binwidth=1)
plt.show()