Pandas 中的向量化操作，固定 columns/rows/values

Question

我想使用固定的列、行或值对 Pandas 数据帧执行操作。

例如：

import numpy as np
import pandas as pd

df = pd.DataFrame({'a':(1,2,3), 'b':(4,5,6), 'c':(7,8,9), 'd':(10,11,12),
                  'e':(13,14,15)})

df
Out[57]: 
   a  b  c   d   e
0  1  4  7  10  13
1  2  5  8  11  14
2  3  6  9  12  15

我想将 'a' 和 'b' 列中的值用作固定值。


# It's easy enough to perform the operation I want on one column at a time:
df.loc[:,'f'] = df.loc[:,'c'] + df.loc[:,'a'] + df.loc[:,'b']

# It gets cumbersome if there are many columns to perform the operation on though:
df.loc[:,'g'] = df.loc[:,'d'] / df.loc[:,'a'] * df.loc[:,'b']
df.loc[:,'h'] = df.loc[:,'e'] / df.loc[:,'a'] * df.loc[:,'b']
# etc.

# This returns columns with all NaN values.
df.loc[:,('f','g','h')] = df.loc[:,'c':'e'] / df.loc[:'a']

在 Pandas 中是否有最佳方法来完成我想做的事情？我无法在 Pandas 文档或此中找到可行的解决方案。我不认为我可以使用 .map() 或 .applymap()，因为我的印象是它们只能用于简单的方程式（一个输入值）。感谢阅读。

Answer 1

使用 div 和 mul 而不是 / 和 * 以及 axis=0:

df[['g', 'h']] = df[['d', 'e']].div(df['a'], axis=0).mul(df['b'], axis=0)
print(df)

# Output
   a  b  c   d   e     g     h
0  1  4  7  10  13  40.0  52.0
1  2  5  8  11  14  27.5  35.0
2  3  6  9  12  15  24.0  30.0

与numpy:

arr = df.to_numpy()
arr[:, [3, 4]] / arr[:, [0]] * arr[:, [1]]

# Output
array([[40. , 52. ],
       [27.5, 35. ],
       [24. , 30. ]])

Answer 2

正如@Corralien 指出的那样，最好使用 Pandas 数据帧操作，例如 .div()，但我也发现 .loc[] 的用法很重要。

# Doesn't work:
df.loc[:,['f','g','h']] = df.loc[:,'c':'e'].div(df.loc[:'a'], axis=0)

# Doesn't work:
df[['f','g','h']] = df.loc[:,'c':'e'].div(df.loc[:'a'], axis=0)

# Now works.
df[['f','g','h']] = df.loc[:,'c':'e'].div(df['a'], axis=0)

目前，我不确定这是为什么。任何见解都会有所帮助，谢谢。

Pandas 中的向量化操作，固定 columns/rows/values

Vectorized operations in Pandas with fixed columns/rows/values

numpy

vectorization

pandas