如何向量化此 pandas 应用函数,该函数使用其他列值作为新列名
How to Vectorize this pandas apply function that uses other column values as new column names
我有一个数据框,我想添加新列,其中一列的名称(“购买”)和另一列的值(“金额”)。我知道如何使用 DataFrame.apply() 来做到这一点,但是我如何对其进行矢量化并使代码更快(在我实际使用的更大数据帧上)?谢谢!
编辑:“obs”列是唯一的。
示例输入:
obs
purchase
amount
1
Coffee
1
2
Juice
1
3
Coffee
2
示例输出:
obs
purchase
amount
Coffee
Juice
1
Coffee
1
1.0
NaN
2
Juice
1
NaN
1.0
3
Coffee
2
2.0
NaN
代码:
import pandas as pd
obs = [1, 2, 3]
purchase = ["Coffee", "Juice", "Coffee"]
amount = [1, 1, 2]
df = pd.DataFrame(
{"obs": obs, "purchase": purchase, "amount": amount})
def get_amount(row):
row[row[f"purchase"]] = row[f"amount"]
return row
df = df.apply(get_amount, axis=1)
假设唯一的obs,你可以pivot
和merge
:
df2 = df.merge(df.pivot('obs', 'purchase', 'amount'), on='obs')
输出:
obs purchase amount Coffee Juice
0 1 Coffee 1 1.0 NaN
1 2 Juice 1 NaN 1.0
2 3 Coffee 2 2.0 NaN
我有一个数据框,我想添加新列,其中一列的名称(“购买”)和另一列的值(“金额”)。我知道如何使用 DataFrame.apply() 来做到这一点,但是我如何对其进行矢量化并使代码更快(在我实际使用的更大数据帧上)?谢谢!
编辑:“obs”列是唯一的。
示例输入:
obs | purchase | amount |
---|---|---|
1 | Coffee | 1 |
2 | Juice | 1 |
3 | Coffee | 2 |
示例输出:
obs | purchase | amount | Coffee | Juice |
---|---|---|---|---|
1 | Coffee | 1 | 1.0 | NaN |
2 | Juice | 1 | NaN | 1.0 |
3 | Coffee | 2 | 2.0 | NaN |
代码:
import pandas as pd
obs = [1, 2, 3]
purchase = ["Coffee", "Juice", "Coffee"]
amount = [1, 1, 2]
df = pd.DataFrame(
{"obs": obs, "purchase": purchase, "amount": amount})
def get_amount(row):
row[row[f"purchase"]] = row[f"amount"]
return row
df = df.apply(get_amount, axis=1)
假设唯一的obs,你可以pivot
和merge
:
df2 = df.merge(df.pivot('obs', 'purchase', 'amount'), on='obs')
输出:
obs purchase amount Coffee Juice
0 1 Coffee 1 1.0 NaN
1 2 Juice 1 NaN 1.0
2 3 Coffee 2 2.0 NaN