pandas:按十年对年份进行分组
pandas: group years by decade
所以我有 CSV 格式的数据。这是我的代码。
data = pd.read_csv('cast.csv')
data = pd.DataFrame(data)
print(data)
结果是这样的。
title year name type \
0 Closet Monster 2015 Buffy #1 actor
1 Suuri illusioni 1985 Homo $ actor
2 Battle of the Sexes 2017 $hutter actor
3 Secret in Their Eyes 2015 $hutter actor
4 Steve Jobs 2015 $hutter actor
... ... ... ... ...
74996 Mia fora kai ena... moro 2011 Penelope Anastasopoulou actress
74997 The Magician King 2004 Tiannah Anastassiades actress
74998 Festival of Lights 2010 Zoe Anastassiou actress
74999 Toxic Tutu 2016 Zoe Anastassiou actress
75000 Fugitive Pieces 2007 Anastassia Anastassopoulou actress
character n
0 Buffy 4 31.0
1 Guests 22.0
2 Bobby Riggs Fan 10.0
3 2002 Dodger Fan NaN
4 1988 Opera House Patron NaN
... ... ...
74996 Popi voulkanizater 11.0
74997 Unicycle Race Attendant NaN
74998 Guidance Counselor 20.0
74999 Demon of Toxicity NaN
75000 Laundry Girl 25.0
[75001 rows x 6 columns]
我想按年份和类型对数据进行分组。然后我想知道特定年份每种类型的大小。所以这是我的代码。
grouped = data.groupby(['year', 'type']).size()
print(grouped)
结果是这样的。
year type
1912 actor 1
actress 2
1913 actor 9
actress 1
1914 actor 38
..
2019 actress 3
2020 actor 3
actress 1
2023 actor 1
actress 2
Length: 220, dtype: int64
问题是,如果我想获取从 1910 年到 2020 年的大小数据,并且增长年份为 10(每十年)。所以年份索引将是 1910、1920、1930、1940 等等,直到 2020 年。
我看到两个简单的选项。
1- 将年份四舍五入到小 10 位:
group = df['year']//10*10 # or df['year'].round(-1)
grouped = data.groupby([group, 'type']).size()
2- 使用 pandas.cut
:
years = list(range(1910,2031,10))
group = pd.cut(s, bins=years, labels=years[:-1])
grouped = data.groupby([group, 'type']).size()
所以我有 CSV 格式的数据。这是我的代码。
data = pd.read_csv('cast.csv')
data = pd.DataFrame(data)
print(data)
结果是这样的。
title year name type \
0 Closet Monster 2015 Buffy #1 actor
1 Suuri illusioni 1985 Homo $ actor
2 Battle of the Sexes 2017 $hutter actor
3 Secret in Their Eyes 2015 $hutter actor
4 Steve Jobs 2015 $hutter actor
... ... ... ... ...
74996 Mia fora kai ena... moro 2011 Penelope Anastasopoulou actress
74997 The Magician King 2004 Tiannah Anastassiades actress
74998 Festival of Lights 2010 Zoe Anastassiou actress
74999 Toxic Tutu 2016 Zoe Anastassiou actress
75000 Fugitive Pieces 2007 Anastassia Anastassopoulou actress
character n
0 Buffy 4 31.0
1 Guests 22.0
2 Bobby Riggs Fan 10.0
3 2002 Dodger Fan NaN
4 1988 Opera House Patron NaN
... ... ...
74996 Popi voulkanizater 11.0
74997 Unicycle Race Attendant NaN
74998 Guidance Counselor 20.0
74999 Demon of Toxicity NaN
75000 Laundry Girl 25.0
[75001 rows x 6 columns]
我想按年份和类型对数据进行分组。然后我想知道特定年份每种类型的大小。所以这是我的代码。
grouped = data.groupby(['year', 'type']).size()
print(grouped)
结果是这样的。
year type
1912 actor 1
actress 2
1913 actor 9
actress 1
1914 actor 38
..
2019 actress 3
2020 actor 3
actress 1
2023 actor 1
actress 2
Length: 220, dtype: int64
问题是,如果我想获取从 1910 年到 2020 年的大小数据,并且增长年份为 10(每十年)。所以年份索引将是 1910、1920、1930、1940 等等,直到 2020 年。
我看到两个简单的选项。
1- 将年份四舍五入到小 10 位:
group = df['year']//10*10 # or df['year'].round(-1)
grouped = data.groupby([group, 'type']).size()
2- 使用 pandas.cut
:
years = list(range(1910,2031,10))
group = pd.cut(s, bins=years, labels=years[:-1])
grouped = data.groupby([group, 'type']).size()