如何向量化此 pandas 应用函数,该函数使用其他列值作为新列名

How to Vectorize this pandas apply function that uses other column values as new column names

我有一个数据框,我想添加新列,其中一列的名称(“购买”)和另一列的值(“金额”)。我知道如何使用 DataFrame.apply() 来做到这一点,但是我如何对其进行矢量化并使代码更快(在我实际使用的更大数据帧上)?谢谢!

编辑:“obs”列是唯一的。

示例输入:

obs purchase amount
1 Coffee 1
2 Juice 1
3 Coffee 2

示例输出:

obs purchase amount Coffee Juice
1 Coffee 1 1.0 NaN
2 Juice 1 NaN 1.0
3 Coffee 2 2.0 NaN

代码:

import pandas as pd

obs = [1, 2, 3]
purchase = ["Coffee", "Juice", "Coffee"]
amount = [1, 1, 2]

df = pd.DataFrame(
    {"obs": obs, "purchase": purchase, "amount": amount})

def get_amount(row):
    row[row[f"purchase"]] = row[f"amount"]
    return row

df = df.apply(get_amount, axis=1)

假设唯一的obs,你可以pivotmerge:

df2 = df.merge(df.pivot('obs', 'purchase', 'amount'), on='obs')

输出:

   obs purchase  amount  Coffee  Juice
0    1   Coffee       1     1.0    NaN
1    2    Juice       1     NaN    1.0
2    3   Coffee       2     2.0    NaN