每两列和两个度量堆叠

Stack per two columns and two measures

我有这样的数据:

order_id     Product_A    Product_B    Price_Product_A    Price_Product_B
100          Pen          Notebook     1.5                3
101          Bag          Watch        10                 12

我需要它看起来像这样:

order_id    product    price
100         Pen        1.5
100         Notebook   3
101         Bag        10
101         Watch      12

如何为此使用 stack() 和 unstack()?我只用它来衡量一个数字。

我会简单地创建两个数据框:一个用于产品 A,一个用于产品 B。然后为两者设置列名并像这样附加它们:

df1 = df[['order_id', 'Product_A', 'Price_Product_A']]
df2 = df[['order_id', 'Product_B', 'Price_Product_B']]

df1.columns = ['order_id', 'product', 'price']
df2.columns = ['order_id', 'product', 'price']

df = df1.append(df2)
df

输出:

    order_id    product price
0   100 Pen       1.5
1   101 Bag       10.0
0   100 Notebook  3.0
1   101 Watch     12.0

也许表示此数据的最佳方式是使用 multi-indexed 数据框。

这是为任意数量的订单和产品创建一个笨拙但有效的方法:

# list containing list of products for each order
prod_array = df[[column for column in df.columns if column[:-1] == 'Product_']].values

# list containing list of prices for each order
price_array = df[[column for column in df.columns if column[:-1] == 'Price_Product_']].values

# list of order ids
order_id_array = df['order_id']

# create empty dataframe
df_mi = pd.DataFrame(columns=["order_id","order_item_id","Product","Price_Product"])

# add rows
for i in range(len(order_id_array)):
    for j in range(len(prod_array[i])):
        df_mi.loc[df_mi.shape[0]] = [order_id_array[i], j, prod_array[i][j], price_array[i][j]]
        
# create multiindex dataframe
df_mi = df_mi.sort_values(['order_id','order_item_id']).set_index(['order_id','order_item_id'])

导致此数据框: multi-index table image

或者将我的解决方案与 JANO 的解决方案相结合:

order_prod_ids = [col[-1] for col in df.columns if col[:-1] == 'Product_']

# create empty dataframe
df_mi = pd.DataFrame(columns=["order_id","product","price","order_prod_id"])

for opid in order_prod_ids:
    df_opid = df[['order_id', 'Product_'+opid, 'Price_Product_'+opid]]
    df_opid.columns = ['order_id', 'product', 'price']
    df_opid['order_prod_id'] = [opid]*df_opid.shape[0]
    df_mi = df_mi.append(df_opid)
    
df_mi = df_mi.sort_values(['order_id','order_prod_id']).set_index(['order_id','order_prod_id'])

有一个方便的函数,wide_to_long:

pd.wide_to_long(df, ['Product','Price_Product'], i='order_id', j='subtype', sep = '_', suffix = '\D+')

输出:

                     Product        Price_Product
order_id    subtype     
100         A        Pen            1.5
101         A        Bag            10.0
100         B        Notebook       3.0
101         B        Watch          12.0

meltunstack也可以达到同样的效果,具有一定的借鉴意义。有点棘手的是将 'variable' 分成两部分,根和后缀,wide_to_long 可以帮助您。对于您的示例,这可能如下所示:

df1 = df.melt(id_vars = 'order_id')
df1['cat'] = df1['variable'].str[:-2]     # you may have to tweak this for your actual data
df1['subtype'] = df1['variable'].str[-1:] # you may have to tweak this for your actual data
(df1.drop(columns = 'variable')
    .set_index(['order_id','subtype','cat'])
    .unstack()
    .droplevel(level=0, axis=1)
    .reset_index()
)