如何在熊猫中做两个数据框的矩阵乘积?

How to do Matrix product of two Data Frames in Panda?

我对 Python 很陌生,最近刚从 Matlab 迁移过来。 Python(Pandas 或 Numpy)中是否有一个命令使 Matlab 像使用 Pandas 创建的两个数据帧的矩阵乘法一样?

使用dot:

import numpy as np
import pandas as pd

np.random.seed(0)

# Numpy
m1 = np.random.randn(5, 5)
m2 = np.random.randn(5, 5)

>>> m1.dot(m2)
array([[ -5.51837355,  -4.08559942,  -1.88020209,   2.88961281,
          0.61755013],
       [  1.4732264 ,  -0.2394676 ,  -0.34717755,  -4.18527913,
         -1.75550855],
       [ -0.1871964 ,   0.76399007,  -0.26550057,  -3.43359244,
         -0.68081106],
       [ -0.23996774,   0.95331428,  -2.833788  ,  -0.37940614,
          0.05464387],
       [  3.73328914,  -0.59578959,   3.96803224, -10.65362381,
         -4.34460348]])

# Pandas
df1 = pd.DataFrame(m1)
df2 = pd.DataFrame(m2)

>>> df1.dot(df2)
          0         1         2          3         4
0 -5.518374 -4.085599 -1.880202   2.889613  0.617550
1  1.473226 -0.239468 -0.347178  -4.185279 -1.755509
2 -0.187196  0.763990 -0.265501  -3.433592 -0.680811
3 -0.239968  0.953314 -2.833788  -0.379406  0.054644
4  3.733289 -0.595790  3.968032 -10.653624 -4.344603

df3 = pd.DataFrame(np.random.randn(5, 3))
df4 = pd.DataFrame(np.random.randn(3, 5))

>>> df3.dot(df4)
          0         1         2         3         4
0  0.991673  1.954500  0.322110  0.493841  0.080462
1  0.160482  1.548039 -0.826426  0.972538 -0.048610
2  0.628194  0.482943  0.742597 -0.236226  0.089525
3 -0.098316  0.817702 -0.725945  1.271506 -0.309596
4 -1.053413  0.948427 -2.445940  2.814147 -0.726829

如果你的 numpy 版本 >= 1.10.0:

,你可以使用 numpy.matmul 作为众所周知的 dot 函数的替代方案
import numpy as np
import pandas as pd

np.random.seed(632)
df1 = pd.DataFrame(np.random.randn(7, 7))
df2 = pd.DataFrame(np.random.randn(7, 7))

In [68]: np.matmul(df1, df2)
Out[68]: 
array([[ 0.08535756, -3.05102895,  3.26148284, -6.27736384, -1.52042691,
         2.40667207, -0.6385153 ],
       [ 5.29731049, -0.94033606, -0.12675555,  1.10453597, -1.70722837,
         2.57797682,  2.37629556],
       [ 0.31841755, -1.46897738, -0.22734008, -4.37852181, -0.98948844,
         3.49939092, -1.36656608],
       [ 0.90757446, -4.6364365 ,  1.86254589, -4.89078986,  0.31928714,
         2.3442364 , -2.29896007],
       [-1.14428758,  6.69735827, -3.8776982 ,  6.87574565,  1.38854952,
        -2.88767356,  1.46302112],
       [ 0.8771236 , -2.01941938,  1.03461007,  0.30331467,  2.39161032,
         0.07345672, -1.30557339],
       [ 0.94310211, -0.54294898,  2.46147932, -3.21588748, -2.98369364,
         3.73941015,  1.31782966]])

性能几乎相同:

In [71]: %timeit np.dot(df1, df2)
10000 loops, best of 3: 63.7 µs per loop

In [73]: %timeit np.matmul(df1, df2)
10000 loops, best of 3: 64.2 µs per loop

但比使用 df1.dot(df2) 更好:

In [82]: %timeit df1.dot(df2)
1000 loops, best of 3: 217 µs per loop