dplyr groupby 百分比并重命名列
dplyr groupby percentage and renaming the column
我想根据提供的促销活动按我的数据框分组并计算百分比。数据框格式如下
Promotion name days rented
nan 577
first month half off 88
nan 22
second month free 55
nan 60
first month half off 20
如果我的数据框名为 df。我将如何按促销名称分组并计算天数百分比并重命名该列。因此,我的第一列是“少于 1 个月的租金数量”。在 R 中,我会说:
df %>% group_by(`Promotion Name`) %>%
summarise("# Rentals < 1 month" = sum(`Days rented` <= 30)/length(`Days rented`)
有人可以在 python 中提供帮助吗?我尝试了以下方法:
我希望格式为:
Promotion Name # rentals < 1 month # rentals < 2 month # rentals < 3 months
None 0.0023 0.005 0.28
First month half off 0.78 0.22 0.76
2nd month free 0.44 etc
我试过了
rented_df.groupby('Promotion Name').sum()
但这并没有给我想要的结果,因为我想对小于 30 天的天数求和并计算长度,最后重命名该列。谢谢。
我觉得你需要groupby
with custom function with boolean indexing
:
df = rented_df.groupby('Promotion name')['days rented']
.apply(lambda x: x[x<=30].sum()/len(x)).reset_index(name='# Rentals < 1 month')
print (df)
Promotion name # Rentals < 1 month
1 first month half off 10.000000
2 second month free 0.000000
但是 groupby 默认删除 NaN
s,所以如果需要它们,请先将 NaN
替换为 fillna
:
之前列中没有的字符串
rented_df['Promotion name'] = rented_df['Promotion name'].fillna('NANS strings')
df = rented_df.groupby('Promotion name')['days rented']
.apply(lambda x: x[x<=30].sum()/len(x)).reset_index(name='# Rentals < 1 month')
print (df)
Promotion name # Rentals < 1 month
0 NANS strings 7.333333
1 first month half off 10.000000
2 second month free 0.000000
对于单独的列需要transform
:
rented_df['Promotion name'] = rented_df['Promotion name'].fillna('NANS strings')
rented_df['# Rentals < 1 month'] = rented_df.groupby('Promotion name')['days rented']
.transform(lambda x: x[x<=30].sum()/len(x))
print (rented_df)
Promotion name days rented # Rentals < 1 month
0 NANS strings 577 7.333333
1 first month half off 88 10.000000
2 NANS strings 22 7.333333
3 second month free 55 0.000000
4 NANS strings 60 7.333333
5 first month half off 20 10.000000
编辑:
rented_df['Promotion name'] = rented_df['Promotion name'].fillna('NANS strings')
g = rented_df.groupby('Promotion name')['days rented']
s1 = g.apply(lambda x: x[x<=30].sum()/len(x)).rename('# Rentals < 1 month')
s2 = g.apply(lambda x: x[x<=60].sum()/len(x)).rename('# Rentals < 2 month')
s3 = g.apply(lambda x: x[x<=90].sum()/len(x)).rename('# Rentals < 3 month')
df = pd.concat([s1,s2,s3], axis=1).reset_index()
print (df)
Promotion name # Rentals < 1 month # Rentals < 2 month \
0 NANS strings 7.333333 27.333333
1 first month half off 10.000000 10.000000
2 second month free 0.000000 55.000000
# Rentals < 3 month
0 27.333333
1 54.000000
2 55.000000
我想根据提供的促销活动按我的数据框分组并计算百分比。数据框格式如下
Promotion name days rented
nan 577
first month half off 88
nan 22
second month free 55
nan 60
first month half off 20
如果我的数据框名为 df。我将如何按促销名称分组并计算天数百分比并重命名该列。因此,我的第一列是“少于 1 个月的租金数量”。在 R 中,我会说:
df %>% group_by(`Promotion Name`) %>%
summarise("# Rentals < 1 month" = sum(`Days rented` <= 30)/length(`Days rented`)
有人可以在 python 中提供帮助吗?我尝试了以下方法:
我希望格式为:
Promotion Name # rentals < 1 month # rentals < 2 month # rentals < 3 months
None 0.0023 0.005 0.28
First month half off 0.78 0.22 0.76
2nd month free 0.44 etc
我试过了
rented_df.groupby('Promotion Name').sum()
但这并没有给我想要的结果,因为我想对小于 30 天的天数求和并计算长度,最后重命名该列。谢谢。
我觉得你需要groupby
with custom function with boolean indexing
:
df = rented_df.groupby('Promotion name')['days rented']
.apply(lambda x: x[x<=30].sum()/len(x)).reset_index(name='# Rentals < 1 month')
print (df)
Promotion name # Rentals < 1 month
1 first month half off 10.000000
2 second month free 0.000000
但是 groupby 默认删除 NaN
s,所以如果需要它们,请先将 NaN
替换为 fillna
:
rented_df['Promotion name'] = rented_df['Promotion name'].fillna('NANS strings')
df = rented_df.groupby('Promotion name')['days rented']
.apply(lambda x: x[x<=30].sum()/len(x)).reset_index(name='# Rentals < 1 month')
print (df)
Promotion name # Rentals < 1 month
0 NANS strings 7.333333
1 first month half off 10.000000
2 second month free 0.000000
对于单独的列需要transform
:
rented_df['Promotion name'] = rented_df['Promotion name'].fillna('NANS strings')
rented_df['# Rentals < 1 month'] = rented_df.groupby('Promotion name')['days rented']
.transform(lambda x: x[x<=30].sum()/len(x))
print (rented_df)
Promotion name days rented # Rentals < 1 month
0 NANS strings 577 7.333333
1 first month half off 88 10.000000
2 NANS strings 22 7.333333
3 second month free 55 0.000000
4 NANS strings 60 7.333333
5 first month half off 20 10.000000
编辑:
rented_df['Promotion name'] = rented_df['Promotion name'].fillna('NANS strings')
g = rented_df.groupby('Promotion name')['days rented']
s1 = g.apply(lambda x: x[x<=30].sum()/len(x)).rename('# Rentals < 1 month')
s2 = g.apply(lambda x: x[x<=60].sum()/len(x)).rename('# Rentals < 2 month')
s3 = g.apply(lambda x: x[x<=90].sum()/len(x)).rename('# Rentals < 3 month')
df = pd.concat([s1,s2,s3], axis=1).reset_index()
print (df)
Promotion name # Rentals < 1 month # Rentals < 2 month \
0 NANS strings 7.333333 27.333333
1 first month half off 10.000000 10.000000
2 second month free 0.000000 55.000000
# Rentals < 3 month
0 27.333333
1 54.000000
2 55.000000