如何在计算的 pandas 数据框中创建新列，该计算发生在除计算将进入的行之外的每一行

Question

例如，假设我得到了 A.1 和 A.2 系列的数据帧 df，如下所示：

我想计算一下所有其他行的平均值的差异，如下所示：

A.1    A.2    B
2      8      (3+5)/2 - (2+1)/2
3      2      (2+5)/2-(8+1)/2
5      1      (2+3)/2-(8+2)/2

我的代码是这样的，怎么写才正确？

df['B'] = mean(df['A.1'].drop(df['B'].index)))-mean(df['A.2'].drop(df['B'].index)))

我必须完全避免循环，并以熊猫式的方式来处理，因为我正在处理巨大的数据集。

Answer 1

尝试：

df.apply(lambda r : df.loc[df.index!=r.name,'A.1'].mean() - df.loc[df.index!=r.name,'A.2'].mean(), axis = 1)

结果集是：

0    2.5
1   -1.0
2   -2.5
dtype: float64

请注意，lambda 函数中的 r.name 只是当前行的索引。

另一种完全没有 lambda 的方法：

(df['A.1'].sum()-df['A.1'])/(len(df)-1) - (df['A.2'].sum()-df['A.2'])/(len(df)-1)

结果同上

How to create new column in pandas dataframe of a calculation that happens to every other row except the one where the calculation will go into