删除子字符串并合并 python/pandas 中的行
Remove substring and merge rows in python/pandas
我的 df:
description total average number
0 NFL football (white) L 49693 66 1007
1 NFL football (white) XL 79682 74 1198
2 NFL football (white) XS 84943 81 3792
3 NFL football (white) S 78371 73 3974
4 NFL football (blue) L 99482 92 3978
5 NFL football (blue) M 32192 51 3135
6 NFL football (blue XL 75343 71 2879
7 NFL football (red) XXL 84391 79 1192
8 NFL football (red) XS 34727 57 992
9 NFL football (red) L 44993 63 1562
我想做的是删除尺寸并留下总和、平均数和每种颜色足球的总数:
description total average number
0 NFL football (white) 292689 74 9971
1 NFL football (blue) 207017 71 9992
2 NFL football (red) 164111 66 3746
非常感谢任何建议!
替换有效,但您也可以使用 rsplit 删除描述中的最后一个词,然后进行分组:
df.description = df.description.apply(lambda x: x.rsplit(' ',1)[0])
df.groupby(by='description')[['total', 'number']].sum()
您可以 groupby
重新格式化的 description
字段(不修改 description
的原始内容),其中重新格式化是通过 space 拆分并排除最后一个部分使用 .str.split()
、.str.join()
。然后与 .agg()
.
聚合
进一步将输出重新格式化为所需的输出,方法是向上取整并强制转换为 .round()
和 .astype()
的整数。
(df.groupby(
df['description'].str.split(' ').str[:-1].str.join(' ')
)
.agg({'total': 'sum', 'average': 'mean', 'number': 'sum'})
.round(0)
.astype(int)
).reset_index()
结果:
description total average number
0 NFL football (blue) 207017 71 9992
1 NFL football (red) 164111 66 3746
2 NFL football (white) 292689 74 9971
我的 df:
description total average number
0 NFL football (white) L 49693 66 1007
1 NFL football (white) XL 79682 74 1198
2 NFL football (white) XS 84943 81 3792
3 NFL football (white) S 78371 73 3974
4 NFL football (blue) L 99482 92 3978
5 NFL football (blue) M 32192 51 3135
6 NFL football (blue XL 75343 71 2879
7 NFL football (red) XXL 84391 79 1192
8 NFL football (red) XS 34727 57 992
9 NFL football (red) L 44993 63 1562
我想做的是删除尺寸并留下总和、平均数和每种颜色足球的总数:
description total average number
0 NFL football (white) 292689 74 9971
1 NFL football (blue) 207017 71 9992
2 NFL football (red) 164111 66 3746
非常感谢任何建议!
替换有效,但您也可以使用 rsplit 删除描述中的最后一个词,然后进行分组:
df.description = df.description.apply(lambda x: x.rsplit(' ',1)[0])
df.groupby(by='description')[['total', 'number']].sum()
您可以 groupby
重新格式化的 description
字段(不修改 description
的原始内容),其中重新格式化是通过 space 拆分并排除最后一个部分使用 .str.split()
、.str.join()
。然后与 .agg()
.
进一步将输出重新格式化为所需的输出,方法是向上取整并强制转换为 .round()
和 .astype()
的整数。
(df.groupby(
df['description'].str.split(' ').str[:-1].str.join(' ')
)
.agg({'total': 'sum', 'average': 'mean', 'number': 'sum'})
.round(0)
.astype(int)
).reset_index()
结果:
description total average number
0 NFL football (blue) 207017 71 9992
1 NFL football (red) 164111 66 3746
2 NFL football (white) 292689 74 9971