将函数应用于一个 df 中的行和另一个 df 中的列的所有组合
Apply a function to all combinations of rows from one df and columns from another df
这个问题在结构上类似于将行向量和列向量相乘产生一个矩阵,然后对所得矩阵的行求和。
除了行向量中的每个元素有两个值 A 和 B,列向量中的每个元素有两个值 X 和 Y。并且运算不是乘法,而是计算 A、B、X 的函数和 Y.
以下代码实现了目标。但是有没有办法在没有循环和求助于 iterrows() 的情况下做到这一点?实际问题中行向量有几千个元素,列向量有几百万。
from numpy import sin, cos, exp, nan
from numpy.random import random
# Sample function that can operate on ndarrays
def myfun(a, b, x, y):
return sin(a+x), exp(b+y)
# sort of a "row vector"
df_ab = pd.DataFrame(random([2,6]),
index=['A','B'],
columns=['AB%d'%i for i in range(6)])
# sort of a "column vector"
df_xy = pd.DataFrame(random([8,2]),
columns=['X','Y'],
index=['XY%d'%i for i in range(8)])
# pre-add columns for the summarized results
df_xy['SUM_FUN0'] = nan
df_xy['SUM_FUN1'] = nan
# for each pair of values X,Y
for _, xy in df_xy.iterrows():
# calculate myfun with each pair of values A,B
funout0, funout1 = myfun(df_ab.loc['A'], df_ab.loc['B'], xy.X, xy.Y)
# summarize and store the result
xy['SUM_FUN0'] = funout0.sum()
xy['SUM_FUN1'] = funout1.sum()
这样的事情怎么样?我没有测试性能,但 apply
通常比 iterrows
.
稍微好一点
import pandas as pd
from numpy import sin, cos, exp, nan, sum
from numpy.random import random
from numba import jit
# Sample function that can operate on ndarrays
@jit(nopython=True)
def myfun(a, b, x, y):
return sum(sin(a+x)), sum(exp(b+y))
# sort of a "row vector"
df_ab = pd.DataFrame(random([2,6]),
index=['A','B'],
columns=['AB%d'%i for i in range(6)])
# sort of a "column vector"
df_xy = pd.DataFrame(random([8,2]),
columns=['X','Y'],
index=['XY%d'%i for i in range(8)])
A = df_ab.loc['A'].values
B = df_ab.loc['B'].values
df_xy['SUM_FUN0'], df_xy['SUM_FUN1'] = list(zip(*df_xy.apply(lambda x: myfun(A, B, x['X'], x['Y']), axis=1)))
这个问题在结构上类似于将行向量和列向量相乘产生一个矩阵,然后对所得矩阵的行求和。
除了行向量中的每个元素有两个值 A 和 B,列向量中的每个元素有两个值 X 和 Y。并且运算不是乘法,而是计算 A、B、X 的函数和 Y.
以下代码实现了目标。但是有没有办法在没有循环和求助于 iterrows() 的情况下做到这一点?实际问题中行向量有几千个元素,列向量有几百万。
from numpy import sin, cos, exp, nan
from numpy.random import random
# Sample function that can operate on ndarrays
def myfun(a, b, x, y):
return sin(a+x), exp(b+y)
# sort of a "row vector"
df_ab = pd.DataFrame(random([2,6]),
index=['A','B'],
columns=['AB%d'%i for i in range(6)])
# sort of a "column vector"
df_xy = pd.DataFrame(random([8,2]),
columns=['X','Y'],
index=['XY%d'%i for i in range(8)])
# pre-add columns for the summarized results
df_xy['SUM_FUN0'] = nan
df_xy['SUM_FUN1'] = nan
# for each pair of values X,Y
for _, xy in df_xy.iterrows():
# calculate myfun with each pair of values A,B
funout0, funout1 = myfun(df_ab.loc['A'], df_ab.loc['B'], xy.X, xy.Y)
# summarize and store the result
xy['SUM_FUN0'] = funout0.sum()
xy['SUM_FUN1'] = funout1.sum()
这样的事情怎么样?我没有测试性能,但 apply
通常比 iterrows
.
import pandas as pd
from numpy import sin, cos, exp, nan, sum
from numpy.random import random
from numba import jit
# Sample function that can operate on ndarrays
@jit(nopython=True)
def myfun(a, b, x, y):
return sum(sin(a+x)), sum(exp(b+y))
# sort of a "row vector"
df_ab = pd.DataFrame(random([2,6]),
index=['A','B'],
columns=['AB%d'%i for i in range(6)])
# sort of a "column vector"
df_xy = pd.DataFrame(random([8,2]),
columns=['X','Y'],
index=['XY%d'%i for i in range(8)])
A = df_ab.loc['A'].values
B = df_ab.loc['B'].values
df_xy['SUM_FUN0'], df_xy['SUM_FUN1'] = list(zip(*df_xy.apply(lambda x: myfun(A, B, x['X'], x['Y']), axis=1)))