转换数据框以便根据条件填充新列

Question

我正在处理一个包含许多列的大型数据框，其中一些列用于计算差异。前两列（计划周期和受益人）是要维护的标识符列。我想转换下面的 table，使其不再具有单独的列 'Total New_x'、'Applied_x'、'Planned_x'（以及相应的 _y 列），而是新的列仅显示基于列 'variable' 的值的金额。例如（如下所示）在第一行中，'variable' = "dif_Total New"，因此转换后的数据框应该有新的 'Amount' 列，其中仅包含来自 [=217 的金额=] 和 'Total New_y'。

我不确定这是否需要条件语句或编写某种函数。

正在转动这个：

规划周期	受益人	总计 New_x	Planned_x	总计 New_y	Applied_y	Planned_y	变量	差异幅度
2019	雪碧	0	0	2627094	0	2627094	dif_Total新	2627094
2019	可乐	0	0	2627094	0	2627094	dif_Planned	2627094
2019	可乐	0	0	1406904	0	1406904	dif_Total新	1406904
2020	百事可乐	1222383	1222383	1222383	42148	1264531	dif_Applied	42148

进入这个：

规划周期	受益人	数量 2	变量	差异幅度
2019	雪碧	2627094	dif_Total新	2627094
2019	可乐	2627094	dif_Planned	2627094
2019	可乐	1406904	dif_Total新	1406904
2020	百事可乐	42148	dif_Applied	42148

相关代码：

compvars = ['Total New','Applied','Planned']
for var in compvars:
        difvar = 'dif_' + var
        varx = var + '_x'
        vary = var + '_y'
        df[difvar] = df[vary] - df[varx]

difvars = ['dif_' + var for var in compvars]
idvars1 = ['Planning Cycle', 'Beneficiary']
compvarsx = [var + '_x' for var in compvars]
compvarsy = [var + '_y' for var in compvars]
df = df[idvars1 + difvars + compvarsx + compvarsy]
df = df.melt(id_vars = idvars1 + compvarsx + compvarsy, value_vars = difvars, value_name = "Difference Magnitude")

Answer 1

希望我没看错你的问题：

def fn(x):
    v = x["variable"].split("_")[-1]
    return pd.Series({"Amount 1": x[v + "_x"], "Amount 2": x[v + "_y"]})


df = pd.concat([df, df.apply(fn, 1)], axis=1)[
    [
        "Planning Cycle",
        "Beneficiary",
        "Amount 1",
        "Amount 2",
        "variable",
        "Difference Magnitude",
    ]
]
print(df)

打印：

   Planning Cycle Beneficiary  Amount 1  Amount 2       variable  Difference Magnitude
0            2019      Sprite         0   2627094  dif_Total New               2627094
1            2019        Coke         0   2627094    dif_Planned               2627094
2            2019        Coke         0   1406904  dif_Total New               1406904
3            2020       Pepsi         0     42148    dif_Applied                 42148

转换数据框以便根据条件填充新列

Transforming a data frame so that new columns are populated based on conditionals

transform

function

dataframe

python-3.x

pandas