合并数据框并为每个项目 ID 实例添加价格数据

Question

我正在尝试合并两个数据框，以便 DF3 中项目 ID 的每个实例显示与 DF1 中的匹配 ID 关联的定价数据。

DF3（我想要完成的）

recipeID	itemID_out	qty_out	buy_price	sell_price	buy_quantity	sell_quantity	id_1_in	qty_id1	buy_price	sell_price	buy_quantity	sell_quantity	id_2_in	qty_id2	buy_price	sell_price	buy_quantity	sell_quantity	id_3_in	qty_id3	buy_price	sell_price	buy_quantity	sell_quantity	id_4_in	qty_id4	buy_price	sell_price	buy_quantity	sell_quantity	id_5_in	qty_id5	buy_price	sell_price	buy_quantity	sell_quantity
1	1986	1	129	167	67267	21637	123	1	10	15	1500	3000	124	1	12	14	550	800	125	1	8	12	124	254	126	1	22	25	1251	890	127	1	64	72	12783	1251515
2	1987	1	1521	1675	654	1245	123	2	10	15	1500	3000
3	1988	1	128376	131429	47	23	123	10	10	15	1500	3000	124	3	12	14	550	800

这是我要合并的两个数据框。

DF1：包含26863行；项目名称、ID 和价格数据的主列表。从 API 中提取，可以添加新项目，并在用户提出更新请求后显示为新行。

itemID	name	buy_price	sell_price	buy_quantity	sell_quantity
1986	XYZ	129	167	67267	21637
123	ABC	10	15	1500	3000
124	DEF	12	14	550	800

DF2（包含 12784 行；从主列表中的项目组合的配方。从 API 中提取，可以添加新配方，并在用户更新请求后显示为新行。）

recipeID	itemID_out	qty_out	id_1_in	qty_id1	id_2_in	qty_id2	id_3_in	qty_id3	id_4_in	qty_id4	id_5_in	qty_id5
1	1986	1	123	1	124	1	125	1	126	1	127	1
2	1987	1	123	2
3	1988	1	123	10	124	3

食谱可以包含 1 到 5 个项目（出现空值）的组合，这些项目由 DF1 and/or DF2 中的 itemID_out 列的 ID 组成。

DF2 中的“id_#_in”列可以包含来自“itemID_out”列的项目 ID，因为该配方使用的是从另一个配方输出的项目。

我尝试使用以下方式合并它：

pd.merge(itemlist_modified, recipelist_modified, left_on='itemID', right_on='itemID_out')

但这只会导致单列想法按预期接收定价数据。

我觉得我正在尝试为此使用错误的功能，非常感谢任何帮助！

提前致谢！

Answer 1

不是一个漂亮的方法，但它首先将成分 table 融化成长格式，然后将其合并到项目列表 table

import pandas as pd
import numpy as np

itemlist_modified = pd.DataFrame({
    'itemID': [1986, 123, 124],
    'name': ['XYZ', 'ABC', 'DEF'],
    'buy_price': [129, 10, 12],
    'sell_price': [167, 15, 14],
    'buy_quantity': [67267, 1500, 550],
    'sell_quantity': [21637, 3000, 800],
})

recipelist_modified = pd.DataFrame({
    'RecipeID': [1, 2, 3],
    'itemID_out': [1986, 1987, 1988],
    'qty_out': [1, 1, 1],
    'id_1_in': [123, 123, 123],
    'qty_id1': [1, 2, 10],
    'id_2_in': [124.0, np.nan, 124.0],
    'qty_id2': [1.0, np.nan, 3.0],
    'id_3_in': [125.0, np.nan, np.nan],
    'qty_id3': [1.0, np.nan, np.nan],
    'id_4_in': [126.0, np.nan, np.nan],
    'qty_id4': [1.0, np.nan, np.nan],
    'id_5_in': [127.0, np.nan, np.nan],
    'qty_id5': [1.0, np.nan, np.nan],
})
    
#columns which are not qty or input id cols
id_vars = ['RecipeID','itemID_out','qty_out']

#prepare dict to map column name to ingredient number
col_renames = {}
col_renames.update({'id_{}_in'.format(i+1):'ingr_{}'.format(i+1) for i in range(5)})
col_renames.update({'qty_id{}'.format(i+1):'ingr_{}'.format(i+1) for i in range(5)})

#melt reciplist into longform
long_recipelist = recipelist_modified.melt(
    id_vars=id_vars,
    var_name='ingredient',
).dropna()

#add a new column to specify whether each row is a qty or an id
long_recipelist['kind'] = np.where(long_recipelist['ingredient'].str.contains('qty'),'qty_in','id_in')

#convert ingredient names
long_recipelist['ingredient'] = long_recipelist['ingredient'].map(col_renames)

#pivot on the new ingredient column
reshape_recipe_list = long_recipelist.pivot(
    index=['RecipeID','itemID_out','qty_out','ingredient'],
    columns='kind',
    values='value',
).reset_index()

#merge with the itemlist
priced_ingredients = pd.merge(reshape_recipe_list, itemlist_modified, left_on='id_in', right_on='itemID')

#pivot on the priced ingredients
priced_ingredients = priced_ingredients.pivot(
    index = ['RecipeID','itemID_out','qty_out'],
    columns = 'ingredient',
)

#flatten the hierarchical columns
priced_ingredients.columns = ["_".join(a[::-1]) for a in priced_ingredients.columns.to_flat_index()]
priced_ingredients.columns.name = ''

priced_ingredients = priced_ingredients.reset_index()

priced_ingredients 部分输出：

合并数据框并为每个项目 ID 实例添加价格数据

merge dataframes and add price data for each instance of an item ID

python

merge

concatenation

dataframe

pandas