pandas:按十年对年份进行分组

pandas: group years by decade

所以我有 CSV 格式的数据。这是我的代码。

data = pd.read_csv('cast.csv')
data = pd.DataFrame(data)
print(data)

结果是这样的。

                          title  year                        name     type  \
0                Closet Monster  2015                    Buffy #1    actor   
1               Suuri illusioni  1985                      Homo $    actor   
2           Battle of the Sexes  2017                     $hutter    actor   
3          Secret in Their Eyes  2015                     $hutter    actor   
4                    Steve Jobs  2015                     $hutter    actor   
...                         ...   ...                         ...      ...   
74996  Mia fora kai ena... moro  2011     Penelope Anastasopoulou  actress   
74997         The Magician King  2004       Tiannah Anastassiades  actress   
74998        Festival of Lights  2010             Zoe Anastassiou  actress   
74999                Toxic Tutu  2016             Zoe Anastassiou  actress   
75000           Fugitive Pieces  2007  Anastassia Anastassopoulou  actress   

                     character     n  
0                      Buffy 4  31.0  
1                       Guests  22.0  
2              Bobby Riggs Fan  10.0  
3              2002 Dodger Fan   NaN  
4      1988 Opera House Patron   NaN  
...                        ...   ...  
74996       Popi voulkanizater  11.0  
74997  Unicycle Race Attendant   NaN  
74998       Guidance Counselor  20.0  
74999        Demon of Toxicity   NaN  
75000             Laundry Girl  25.0  

[75001 rows x 6 columns]

我想按年份和类型对数据进行分组。然后我想知道特定年份每种类型的大小。所以这是我的代码。

grouped = data.groupby(['year', 'type']).size()
print(grouped)

结果是这样的。

year  type   
1912  actor       1
      actress     2
1913  actor       9
      actress     1
1914  actor      38
                 ..
2019  actress     3
2020  actor       3
      actress     1
2023  actor       1
      actress     2
Length: 220, dtype: int64

问题是,如果我想获取从 1910 年到 2020 年的大小数据,并且增长年份为 10(每十年)。所以年份索引将是 1910、1920、1930、1940 等等,直到 2020 年。

我看到两个简单的选项。

1- 将年份四舍五入到小 10 位:

group = df['year']//10*10  # or df['year'].round(-1)
grouped = data.groupby([group, 'type']).size()

2- 使用 pandas.cut:

years = list(range(1910,2031,10))
group = pd.cut(s, bins=years, labels=years[:-1])
grouped = data.groupby([group, 'type']).size()