Pandas groupby agg:考虑商品数量,对每个订单 ID 的字符串价格求和

Pandas groupby agg: summing string prices per order ID taking into account item quantity

如何放置具有相同 order_id 的行,以便所有对应的行加起来形成结果 Dataframe? (在这种情况下,数量和商品价格应在其前面添加相应的 order_id,并且 choice_description 和 item_name 也应以其“str”格式添加)

可重现的输入:

d = {'order_id': [1, 1, 1, 1, 2], 'quantity': [1, 1, 1, 1, 2], 'item_name': ['Chips and Fresh Tomato Salsa', 'Izze', 'Nantucket Nectar', 'Chips and Tomatillo-Green Chili Salsa', 'Chicken Bowl'], 'choice_description': [nan, '[Clementine]', '[Apple]', nan, '[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]'], 'item_price': ['.39 ', '.39 ', '.39 ', '.39 ', '.98 ']}
df = pd.DataFrame(d)

您可以使用:

out = (df
      .assign(price=pd.to_numeric(df['item_price'].str.strip('$'), errors='coerce')
                      .mul(df['quantity']),
              choice_description=df['choice_description'].astype(str),
              )
      .groupby('order_id')
      .agg({'item_name': ','.join,
            'choice_description':  ','.join,
            'price': 'sum',
            })
      .assign(price=lambda d: '$'+d['price'].round(2).astype(str))
      )

输出:

                                                                                         item_name                                                          choice_description   price
order_id                                                                                                                                                                              
1         Chips and Fresh Tomato Salsa,Izze,Nantucket Nectar,Chips and Tomatillo-Green Chili Salsa                                                nan,[Clementine],[Apple],nan  .56
2                                                                                     Chicken Bowl  [Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]  .96

我也是Pandas新手,边回答边学习。

测试数据为:

您还可以这样做:

import pandas as pd
df = pd.read_csv('./test-csv.csv')
df['item_price'] = df.item_price.str.replace('$', ' ', regex=True)
df['item_price'] = pd.to_numeric(df.item_price)
res_df = df.groupby('order_id').aggregate({'item_name': ', '.join, 'choice_description': ', '.join}, pd.Series.sum)
df = df.groupby('order_id').aggregate(pd.Series.sum)
res_df['item_price'] = df['item_price']
res_df_item_price = '$' + df['item_price'].astype(str)
res_df['item_price'] = res_df_item_price
df = res_df
print(df)

解决方案输出如下: