Pandas groupby agg:考虑商品数量,对每个订单 ID 的字符串价格求和
Pandas groupby agg: summing string prices per order ID taking into account item quantity
如何放置具有相同 order_id 的行,以便所有对应的行加起来形成结果 Dataframe? (在这种情况下,数量和商品价格应在其前面添加相应的 order_id,并且 choice_description 和 item_name 也应以其“str”格式添加)
可重现的输入:
d = {'order_id': [1, 1, 1, 1, 2], 'quantity': [1, 1, 1, 1, 2], 'item_name': ['Chips and Fresh Tomato Salsa', 'Izze', 'Nantucket Nectar', 'Chips and Tomatillo-Green Chili Salsa', 'Chicken Bowl'], 'choice_description': [nan, '[Clementine]', '[Apple]', nan, '[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]'], 'item_price': ['.39 ', '.39 ', '.39 ', '.39 ', '.98 ']}
df = pd.DataFrame(d)
您可以使用:
out = (df
.assign(price=pd.to_numeric(df['item_price'].str.strip('$'), errors='coerce')
.mul(df['quantity']),
choice_description=df['choice_description'].astype(str),
)
.groupby('order_id')
.agg({'item_name': ','.join,
'choice_description': ','.join,
'price': 'sum',
})
.assign(price=lambda d: '$'+d['price'].round(2).astype(str))
)
输出:
item_name choice_description price
order_id
1 Chips and Fresh Tomato Salsa,Izze,Nantucket Nectar,Chips and Tomatillo-Green Chili Salsa nan,[Clementine],[Apple],nan .56
2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]] .96
我也是Pandas新手,边回答边学习。
测试数据为:
您还可以这样做:
import pandas as pd
df = pd.read_csv('./test-csv.csv')
df['item_price'] = df.item_price.str.replace('$', ' ', regex=True)
df['item_price'] = pd.to_numeric(df.item_price)
res_df = df.groupby('order_id').aggregate({'item_name': ', '.join, 'choice_description': ', '.join}, pd.Series.sum)
df = df.groupby('order_id').aggregate(pd.Series.sum)
res_df['item_price'] = df['item_price']
res_df_item_price = '$' + df['item_price'].astype(str)
res_df['item_price'] = res_df_item_price
df = res_df
print(df)
解决方案输出如下:
如何放置具有相同 order_id 的行,以便所有对应的行加起来形成结果 Dataframe? (在这种情况下,数量和商品价格应在其前面添加相应的 order_id,并且 choice_description 和 item_name 也应以其“str”格式添加)
可重现的输入:
d = {'order_id': [1, 1, 1, 1, 2], 'quantity': [1, 1, 1, 1, 2], 'item_name': ['Chips and Fresh Tomato Salsa', 'Izze', 'Nantucket Nectar', 'Chips and Tomatillo-Green Chili Salsa', 'Chicken Bowl'], 'choice_description': [nan, '[Clementine]', '[Apple]', nan, '[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]'], 'item_price': ['.39 ', '.39 ', '.39 ', '.39 ', '.98 ']}
df = pd.DataFrame(d)
您可以使用:
out = (df
.assign(price=pd.to_numeric(df['item_price'].str.strip('$'), errors='coerce')
.mul(df['quantity']),
choice_description=df['choice_description'].astype(str),
)
.groupby('order_id')
.agg({'item_name': ','.join,
'choice_description': ','.join,
'price': 'sum',
})
.assign(price=lambda d: '$'+d['price'].round(2).astype(str))
)
输出:
item_name choice_description price
order_id
1 Chips and Fresh Tomato Salsa,Izze,Nantucket Nectar,Chips and Tomatillo-Green Chili Salsa nan,[Clementine],[Apple],nan .56
2 Chicken Bowl [Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]] .96
我也是Pandas新手,边回答边学习。
测试数据为:
您还可以这样做:
import pandas as pd
df = pd.read_csv('./test-csv.csv')
df['item_price'] = df.item_price.str.replace('$', ' ', regex=True)
df['item_price'] = pd.to_numeric(df.item_price)
res_df = df.groupby('order_id').aggregate({'item_name': ', '.join, 'choice_description': ', '.join}, pd.Series.sum)
df = df.groupby('order_id').aggregate(pd.Series.sum)
res_df['item_price'] = df['item_price']
res_df_item_price = '$' + df['item_price'].astype(str)
res_df['item_price'] = res_df_item_price
df = res_df
print(df)
解决方案输出如下: