Pandas DataFrame 中的 SUMPRODUCT 特定列

Question

根据下面提到的数据框，我正在尝试针对列 S1、S2 和 S3 计算 excel 列 V1、V2 和 V3 的类型 SUMPRODUCT。

df = pd.DataFrame({'Name': ['A', 'B', 'C'],
                   'Qty': [100, 150, 200],
                   'Remarks': ['Bad', 'Avg', 'Good'],
                   'V1': [0,1,1],
                   'V2': [1,1,0],
                   'V3': [0,0,1],
                   'S1': [1,0,1],
                   'S2': [0,1,0],
                   'S3': [1,0,1]
            })

我正在寻找一种无需使用每列名称的方法，例如：

df['SP'] = df[['V1', 'S1']].prod(axis=1) + df[['V2', 'S2']].prod(axis=1) + df[['V3', 'S3']].prod(axis=1)

在我的真实数据框中，我在 'V' 和 'S' 类别中都有超过 50 列，因此上述方法是不可能的。

有什么建议吗？

谢谢！

Answer 1

您可以尝试这样的操作：

# need to edit these two lines to work with your larger DataFrame
v_cols = df.columns[3:6]  # ['V1', 'V2', 'V3']
s_cols = df.columns[6:]  # ['S1', 'S2', 'S3']

df['SP'] = (df[v_cols].to_numpy() * df[s_cols].to_numpy()).sum(axis=1)

在看到@ALollz 关于 MultiIndex 使对齐更简单的评论后，使用替代方案进行了编辑：

df.set_index(['Name', 'Qty', 'Remarks'], inplace=True)
n_cols = df.shape[1] // 2
v_cols = df.columns[:n_cols]
s_cols = df.columns[n_cols:]
df['SP'] = (df[v_cols].to_numpy() * df[s_cols].to_numpy()).sum(axis=1)

如果愿意，您可以重新设置索引：

df.reset_index(inplace=True)

结果：

  Name  Qty Remarks  V1  V2  V3  S1  S2  S3  SP
0    A  100     Bad   0   1   0   1   0   1   0
1    B  150     Avg   1   1   0   0   1   0   1
2    C  200    Good   1   0   1   1   0   1   2

Answer 2

过滤类似 S 和 V 的列，然后将 S 列与相应的 V 列相乘，然后沿列轴对结果求和

s = df.filter(regex='S\d+')
p = df.filter(regex='V\d+')

df['SP'] = s.mul(p.values).sum(1)

  Name  Qty Remarks  V1  V2  V3  S1  S2  S3  SP
0    A  100     Bad   0   1   0   1   0   1   0
1    B  150     Avg   1   1   0   0   1   0   1
2    C  200    Good   1   0   1   1   0   1   2

PS：此解决方案假定原始数据帧中 S 和 V 列的出现顺序匹配。

Answer 3

如果您的 Vn 和 Sn 列中的顺序正确

v_cols = df.filter(like='V').columns
s_cols = df.filter(like='S').columns

df['SP2'] = sum([df[[v, s]].prod(axis=1) for v, s in zip(v_cols, s_cols)])

print(df)

  Name  Qty Remarks  V1  V2  V3  S1  S2  S3  SP  SP2
0    A  100     Bad   0   1   0   1   0   1   0    0
1    B  150     Avg   1   1   0   0   1   0   1    1
2    C  200    Good   1   0   1   1   0   1   2    2

Pandas DataFrame 中的 SUMPRODUCT 特定列

SUMPRODUCT Specific columns in Pandas DataFrame

python

multiplication

dataframe

pandas

sumproduct