使用 pandas 每个季节每个集群的总集群百分比

Question

我有一个 pandas DataFrame，看起来像 this，总共有 12 个集群。某些星团不会出现在某个季节。

我想在每个季节的特定集群的百分比上创建一个多线图。因此，如果 97-98 赛季有 30 支球队，而集群 1 中有 10 支球队，那么该值将为 .33，因为集群 1 拥有可能的总位置的三分之一。

它看起来像 this

我希望日期集看起来像 this，其中每个集群在该季节的整个集群数量中都有自己的百分比（按百分比）。我尝试使用 pandas groupby 方法获取一堆列表，然后在其上使用 value_counts() 但这不起作用，因为循环通过 df.groupby(['SEASON']) returns tuples, not a Series..

非常感谢

Answer 1

使用 .groupby 结合 .value_counts 和 .unstack:

temp_df = df.groupby(['SEASON'])['Cluster'].value_counts(normalize=True).unstack().fillna(0.0)
temp_df.plot()
print(temp_df.round(2))

Cluster   0     1     2     4     5    6     7     10    11
SEASON                                                     
1996-97  0.1  0.21  0.17  0.21  0.07  0.1  0.03  0.07  0.03
1997-98  0.2  0.00  0.20  0.20  0.00  0.0  0.20  0.20  0.00

使用 pandas 每个季节每个集群的总集群百分比

Percent of total clusters per cluster per season using pandas

python

data-visualization

series

dataframe

pandas