不同长度的数据帧相乘
multiplication of dataframes with differnet lengths
我有两个数据框:都有 5 列,但第一个有 100 行,第二个只有一行。我应该将第一个数据帧的每一行乘以第二个数据帧的这一行,然后总结每行中列的值和第 6 个新列“乘法总和”中的这个值。我见过 "np.dot"操作,但我不确定我是否可以将它应用于数据帧。另外,我正在寻找 pythonic/pandas 操作或方法,是否可以从头开始替换一些笨重的 numpy 代码?提前谢谢征求您的意见。
我认为您可以通过 values
, multiple them and last sum
:
将 DataFrames
转换为 numpy arrays
import pandas as pd
import numpy as np
np.random.seed(1)
df1 = pd.DataFrame(np.random.randint(10, size=(1,5)))
df1.columns = list('ABCDE')
print df1
A B C D E
0 5 8 9 5 0
np.random.seed(0)
df2 = pd.DataFrame(np.random.randint(10,size=(10,5)))
df2.columns = list('ABCDE')
print df2
A B C D E
0 5 0 3 3 7
1 9 3 5 2 4
2 7 6 8 8 1
3 6 7 7 8 1
4 5 9 8 9 4
5 3 0 3 5 0
6 2 3 8 1 3
7 3 3 7 0 1
8 9 9 0 4 7
9 3 2 7 2 0
print df2.values * df1.values
[[25 0 27 15 0]
[45 24 45 10 0]
[35 48 72 40 0]
[30 56 63 40 0]
[25 72 72 45 0]
[15 0 27 25 0]
[10 24 72 5 0]
[15 24 63 0 0]
[45 72 0 20 0]
[15 16 63 10 0]]
df = pd.DataFrame(df2.values * df1.values)
df['sum'] = df.sum(axis=1)
print df
0 1 2 3 4 sum
0 25 0 27 15 0 67
1 45 24 45 10 0 124
2 35 48 72 40 0 195
3 30 56 63 40 0 189
4 25 72 72 45 0 214
5 15 0 27 25 0 67
6 10 24 72 5 0 111
7 15 24 63 0 0 102
8 45 72 0 20 0 137
9 15 16 63 10 0 104
时间:
In [1185]: %timeit df2.mul(df1.ix[0], axis=1)
The slowest run took 5.07 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 287 µs per loop
In [1186]: %timeit pd.DataFrame(df2.values * df1.values)
The slowest run took 6.31 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 98 µs per loop
您可能正在寻找这样的东西:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({ 'A' : [1.1,2.7, 3.4],
'B' : [-1.,-2.5, -3.9]})
df1['sum of multipliations']=df1.sum(axis = 1)
df2 = pd.DataFrame({ 'A' : [2.],
'B' : [3.],
'sum of multipliations' : [1.]})
print df1
print df2
row = df2.ix[0]
df5=df1.mul(row, axis=1)
df5.loc['Total']= df5.sum()
print df5
我有两个数据框:都有 5 列,但第一个有 100 行,第二个只有一行。我应该将第一个数据帧的每一行乘以第二个数据帧的这一行,然后总结每行中列的值和第 6 个新列“乘法总和”中的这个值。我见过 "np.dot"操作,但我不确定我是否可以将它应用于数据帧。另外,我正在寻找 pythonic/pandas 操作或方法,是否可以从头开始替换一些笨重的 numpy 代码?提前谢谢征求您的意见。
我认为您可以通过 values
, multiple them and last sum
:
DataFrames
转换为 numpy arrays
import pandas as pd
import numpy as np
np.random.seed(1)
df1 = pd.DataFrame(np.random.randint(10, size=(1,5)))
df1.columns = list('ABCDE')
print df1
A B C D E
0 5 8 9 5 0
np.random.seed(0)
df2 = pd.DataFrame(np.random.randint(10,size=(10,5)))
df2.columns = list('ABCDE')
print df2
A B C D E
0 5 0 3 3 7
1 9 3 5 2 4
2 7 6 8 8 1
3 6 7 7 8 1
4 5 9 8 9 4
5 3 0 3 5 0
6 2 3 8 1 3
7 3 3 7 0 1
8 9 9 0 4 7
9 3 2 7 2 0
print df2.values * df1.values
[[25 0 27 15 0]
[45 24 45 10 0]
[35 48 72 40 0]
[30 56 63 40 0]
[25 72 72 45 0]
[15 0 27 25 0]
[10 24 72 5 0]
[15 24 63 0 0]
[45 72 0 20 0]
[15 16 63 10 0]]
df = pd.DataFrame(df2.values * df1.values)
df['sum'] = df.sum(axis=1)
print df
0 1 2 3 4 sum
0 25 0 27 15 0 67
1 45 24 45 10 0 124
2 35 48 72 40 0 195
3 30 56 63 40 0 189
4 25 72 72 45 0 214
5 15 0 27 25 0 67
6 10 24 72 5 0 111
7 15 24 63 0 0 102
8 45 72 0 20 0 137
9 15 16 63 10 0 104
时间:
In [1185]: %timeit df2.mul(df1.ix[0], axis=1)
The slowest run took 5.07 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 287 µs per loop
In [1186]: %timeit pd.DataFrame(df2.values * df1.values)
The slowest run took 6.31 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 98 µs per loop
您可能正在寻找这样的东西:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({ 'A' : [1.1,2.7, 3.4],
'B' : [-1.,-2.5, -3.9]})
df1['sum of multipliations']=df1.sum(axis = 1)
df2 = pd.DataFrame({ 'A' : [2.],
'B' : [3.],
'sum of multipliations' : [1.]})
print df1
print df2
row = df2.ix[0]
df5=df1.mul(row, axis=1)
df5.loc['Total']= df5.sum()
print df5