不同长度的数据帧相乘

multiplication of dataframes with differnet lengths

我有两个数据框:都有 5 列,但第一个有 100 行,第二个只有一行。我应该将第一个数据帧的每一行乘以第二个数据帧的这一行,然后总结每行中列的值和第 6 个新列“乘法总和”中的这个值。我见过 "np.dot"操作,但我不确定我是否可以将它应用于数据帧。另外,我正在寻找 pythonic/pandas 操作或方法,是否可以从头开始替换一些笨重的 numpy 代码?提前谢谢征求您的意见。

我认为您可以通过 values, multiple them and last sum:

DataFrames 转换为 numpy arrays
import pandas as pd
import numpy as np

np.random.seed(1)
df1 = pd.DataFrame(np.random.randint(10, size=(1,5)))
df1.columns = list('ABCDE')
print df1
   A  B  C  D  E
0  5  8  9  5  0

np.random.seed(0)
df2 = pd.DataFrame(np.random.randint(10,size=(10,5)))
df2.columns = list('ABCDE')
print df2
   A  B  C  D  E
0  5  0  3  3  7
1  9  3  5  2  4
2  7  6  8  8  1
3  6  7  7  8  1
4  5  9  8  9  4
5  3  0  3  5  0
6  2  3  8  1  3
7  3  3  7  0  1
8  9  9  0  4  7
9  3  2  7  2  0
print df2.values * df1.values
[[25  0 27 15  0]
 [45 24 45 10  0]
 [35 48 72 40  0]
 [30 56 63 40  0]
 [25 72 72 45  0]
 [15  0 27 25  0]
 [10 24 72  5  0]
 [15 24 63  0  0]
 [45 72  0 20  0]
 [15 16 63 10  0]]

df = pd.DataFrame(df2.values * df1.values)
df['sum'] = df.sum(axis=1)
print df
    0   1   2   3  4  sum
0  25   0  27  15  0   67
1  45  24  45  10  0  124
2  35  48  72  40  0  195
3  30  56  63  40  0  189
4  25  72  72  45  0  214
5  15   0  27  25  0   67
6  10  24  72   5  0  111
7  15  24  63   0  0  102
8  45  72   0  20  0  137
9  15  16  63  10  0  104

时间:

In [1185]: %timeit df2.mul(df1.ix[0], axis=1)
The slowest run took 5.07 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 287 µs per loop

In [1186]: %timeit pd.DataFrame(df2.values * df1.values)
The slowest run took 6.31 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 98 µs per loop

您可能正在寻找这样的东西:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({ 'A' : [1.1,2.7, 3.4], 
                     'B' : [-1.,-2.5, -3.9]})

df1['sum of multipliations']=df1.sum(axis = 1)


df2 = pd.DataFrame({ 'A' : [2.], 
                     'B' : [3.], 
                     'sum of multipliations' : [1.]})

print df1
print df2

row = df2.ix[0]
df5=df1.mul(row, axis=1)
df5.loc['Total']= df5.sum()
print df5