如何在熊猫中做两个数据框的矩阵乘积?
How to do Matrix product of two Data Frames in Panda?
我对 Python 很陌生,最近刚从 Matlab 迁移过来。 Python(Pandas 或 Numpy)中是否有一个命令使 Matlab 像使用 Pandas 创建的两个数据帧的矩阵乘法一样?
使用dot
:
import numpy as np
import pandas as pd
np.random.seed(0)
# Numpy
m1 = np.random.randn(5, 5)
m2 = np.random.randn(5, 5)
>>> m1.dot(m2)
array([[ -5.51837355, -4.08559942, -1.88020209, 2.88961281,
0.61755013],
[ 1.4732264 , -0.2394676 , -0.34717755, -4.18527913,
-1.75550855],
[ -0.1871964 , 0.76399007, -0.26550057, -3.43359244,
-0.68081106],
[ -0.23996774, 0.95331428, -2.833788 , -0.37940614,
0.05464387],
[ 3.73328914, -0.59578959, 3.96803224, -10.65362381,
-4.34460348]])
# Pandas
df1 = pd.DataFrame(m1)
df2 = pd.DataFrame(m2)
>>> df1.dot(df2)
0 1 2 3 4
0 -5.518374 -4.085599 -1.880202 2.889613 0.617550
1 1.473226 -0.239468 -0.347178 -4.185279 -1.755509
2 -0.187196 0.763990 -0.265501 -3.433592 -0.680811
3 -0.239968 0.953314 -2.833788 -0.379406 0.054644
4 3.733289 -0.595790 3.968032 -10.653624 -4.344603
df3 = pd.DataFrame(np.random.randn(5, 3))
df4 = pd.DataFrame(np.random.randn(3, 5))
>>> df3.dot(df4)
0 1 2 3 4
0 0.991673 1.954500 0.322110 0.493841 0.080462
1 0.160482 1.548039 -0.826426 0.972538 -0.048610
2 0.628194 0.482943 0.742597 -0.236226 0.089525
3 -0.098316 0.817702 -0.725945 1.271506 -0.309596
4 -1.053413 0.948427 -2.445940 2.814147 -0.726829
如果你的 numpy 版本 >= 1.10.0
:
,你可以使用 numpy.matmul 作为众所周知的 dot
函数的替代方案
import numpy as np
import pandas as pd
np.random.seed(632)
df1 = pd.DataFrame(np.random.randn(7, 7))
df2 = pd.DataFrame(np.random.randn(7, 7))
In [68]: np.matmul(df1, df2)
Out[68]:
array([[ 0.08535756, -3.05102895, 3.26148284, -6.27736384, -1.52042691,
2.40667207, -0.6385153 ],
[ 5.29731049, -0.94033606, -0.12675555, 1.10453597, -1.70722837,
2.57797682, 2.37629556],
[ 0.31841755, -1.46897738, -0.22734008, -4.37852181, -0.98948844,
3.49939092, -1.36656608],
[ 0.90757446, -4.6364365 , 1.86254589, -4.89078986, 0.31928714,
2.3442364 , -2.29896007],
[-1.14428758, 6.69735827, -3.8776982 , 6.87574565, 1.38854952,
-2.88767356, 1.46302112],
[ 0.8771236 , -2.01941938, 1.03461007, 0.30331467, 2.39161032,
0.07345672, -1.30557339],
[ 0.94310211, -0.54294898, 2.46147932, -3.21588748, -2.98369364,
3.73941015, 1.31782966]])
性能几乎相同:
In [71]: %timeit np.dot(df1, df2)
10000 loops, best of 3: 63.7 µs per loop
In [73]: %timeit np.matmul(df1, df2)
10000 loops, best of 3: 64.2 µs per loop
但比使用 df1.dot(df2)
更好:
In [82]: %timeit df1.dot(df2)
1000 loops, best of 3: 217 µs per loop
我对 Python 很陌生,最近刚从 Matlab 迁移过来。 Python(Pandas 或 Numpy)中是否有一个命令使 Matlab 像使用 Pandas 创建的两个数据帧的矩阵乘法一样?
使用dot
:
import numpy as np
import pandas as pd
np.random.seed(0)
# Numpy
m1 = np.random.randn(5, 5)
m2 = np.random.randn(5, 5)
>>> m1.dot(m2)
array([[ -5.51837355, -4.08559942, -1.88020209, 2.88961281,
0.61755013],
[ 1.4732264 , -0.2394676 , -0.34717755, -4.18527913,
-1.75550855],
[ -0.1871964 , 0.76399007, -0.26550057, -3.43359244,
-0.68081106],
[ -0.23996774, 0.95331428, -2.833788 , -0.37940614,
0.05464387],
[ 3.73328914, -0.59578959, 3.96803224, -10.65362381,
-4.34460348]])
# Pandas
df1 = pd.DataFrame(m1)
df2 = pd.DataFrame(m2)
>>> df1.dot(df2)
0 1 2 3 4
0 -5.518374 -4.085599 -1.880202 2.889613 0.617550
1 1.473226 -0.239468 -0.347178 -4.185279 -1.755509
2 -0.187196 0.763990 -0.265501 -3.433592 -0.680811
3 -0.239968 0.953314 -2.833788 -0.379406 0.054644
4 3.733289 -0.595790 3.968032 -10.653624 -4.344603
df3 = pd.DataFrame(np.random.randn(5, 3))
df4 = pd.DataFrame(np.random.randn(3, 5))
>>> df3.dot(df4)
0 1 2 3 4
0 0.991673 1.954500 0.322110 0.493841 0.080462
1 0.160482 1.548039 -0.826426 0.972538 -0.048610
2 0.628194 0.482943 0.742597 -0.236226 0.089525
3 -0.098316 0.817702 -0.725945 1.271506 -0.309596
4 -1.053413 0.948427 -2.445940 2.814147 -0.726829
如果你的 numpy 版本 >= 1.10.0
:
dot
函数的替代方案
import numpy as np
import pandas as pd
np.random.seed(632)
df1 = pd.DataFrame(np.random.randn(7, 7))
df2 = pd.DataFrame(np.random.randn(7, 7))
In [68]: np.matmul(df1, df2)
Out[68]:
array([[ 0.08535756, -3.05102895, 3.26148284, -6.27736384, -1.52042691,
2.40667207, -0.6385153 ],
[ 5.29731049, -0.94033606, -0.12675555, 1.10453597, -1.70722837,
2.57797682, 2.37629556],
[ 0.31841755, -1.46897738, -0.22734008, -4.37852181, -0.98948844,
3.49939092, -1.36656608],
[ 0.90757446, -4.6364365 , 1.86254589, -4.89078986, 0.31928714,
2.3442364 , -2.29896007],
[-1.14428758, 6.69735827, -3.8776982 , 6.87574565, 1.38854952,
-2.88767356, 1.46302112],
[ 0.8771236 , -2.01941938, 1.03461007, 0.30331467, 2.39161032,
0.07345672, -1.30557339],
[ 0.94310211, -0.54294898, 2.46147932, -3.21588748, -2.98369364,
3.73941015, 1.31782966]])
性能几乎相同:
In [71]: %timeit np.dot(df1, df2)
10000 loops, best of 3: 63.7 µs per loop
In [73]: %timeit np.matmul(df1, df2)
10000 loops, best of 3: 64.2 µs per loop
但比使用 df1.dot(df2)
更好:
In [82]: %timeit df1.dot(df2)
1000 loops, best of 3: 217 µs per loop