删除子字符串并合并 python/pandas 中的行

Remove substring and merge rows in python/pandas

我的 df:

   description               total      average      number
0 NFL football (white) L     49693        66       1007
1 NFL football (white) XL    79682        74       1198
2 NFL football (white) XS    84943        81       3792
3 NFL football (white) S     78371        73       3974
4 NFL football (blue) L      99482        92       3978
5 NFL football (blue) M      32192        51       3135
6 NFL football (blue XL      75343        71       2879
7 NFL football (red) XXL     84391        79       1192
8 NFL football (red) XS      34727        57       992
9 NFL football (red) L       44993        63       1562

我想做的是删除尺寸并留下总和、平均数和每种颜色足球的总数:

   description               total      average    number
0 NFL football (white)       292689       74       9971
1 NFL football (blue)        207017       71       9992
2 NFL football (red)         164111       66       3746

非常感谢任何建议!

替换有效,但您也可以使用 rsplit 删除描述中的最后一个词,然后进行分组:

df.description = df.description.apply(lambda x: x.rsplit(' ',1)[0])

df.groupby(by='description')[['total', 'number']].sum() 

您可以 groupby 重新格式化的 description 字段(不修改 description 的原始内容),其中重新格式化是通过 space 拆分并排除最后一个部分使用 .str.split().str.join()。然后与 .agg().

聚合

进一步将输出重新格式化为所需的输出,方法是向上取整并强制转换为 .round().astype() 的整数。

(df.groupby(
            df['description'].str.split(' ').str[:-1].str.join(' ')
           )
   .agg({'total': 'sum', 'average': 'mean', 'number': 'sum'})
   .round(0)
   .astype(int)
).reset_index()

结果:

            description   total  average  number
0   NFL football (blue)  207017       71    9992
1    NFL football (red)  164111       66    3746
2  NFL football (white)  292689       74    9971