如何 "multiply" python pandas 数据帧(就好像它们是向量一样)?

How to "multiply" python pandas dataframes (as if they were vectors)?

我正在学习 pandas。我有两个数据框:

df1 = 
quality1  value
A         1
B         2
C         3

df2 = 
quality2  value
D         1
E         10
F         100

我想将它们相乘(就像我可能对向量做的那样得到矩阵)。答案应该是:

df3 = 
quality1    quality2  value
A           D         1
            E         10
            F         100
B           D         2
            E         20
            F         200
C           D         3
            E         30
            F         300

我怎样才能做到这一点?

它不是最漂亮的,但它会起作用:

>>> df1["dummy"] = 1
>>> df2["dummy"] = 1
>>> dfm = df1.merge(df2, on="dummy")
>>> dfm["value"] = dfm.pop("value_x") * dfm.pop("value_y")
>>> del dfm["dummy"]
>>> dfm
  quality1 quality2  value
0        A        D      1
1        A        E     10
2        A        F    100
3        B        D      2
4        B        E     20
5        B        F    200
6        C        D      3
7        C        E     30
8        C        F    300

直到我们获得对笛卡尔连接的本机支持(吹口哨并移开视线..),在虚拟列上合并是获得相同效果的简单方法。中间框架看起来像

>>> dfm
  quality1  value_x  dummy quality2  value_y
0        A        1      1        D        1
1        A        1      1        E       10
2        A        1      1        F      100
3        B        2      1        D        1
4        B        2      1        E       10
5        B        2      1        F      100
6        C        3      1        D        1
7        C        3      1        E       10
8        C        3      1        F      100

您还可以使用 scikit-learn 中的 cartesian 函数:

from sklearn.utils.extmath import cartesian

# Your data:
df1 = pd.DataFrame({'quality1':list('ABC'), 'value':[1,2,3]})
df2 = pd.DataFrame({'quality2':list('DEF'), 'value':[1,10,100]})

# Make the matrix of labels:
dfm = pd.DataFrame(cartesian((df1.quality1.values, df2.quality2.values)), 
                   columns=['quality1', 'quality2'])

# Multiply values:
dfm['value'] = df1.value.values.repeat(df2.value.size) * pd.np.tile(df2.value.values, df1.value.size)

print dfm.set_index(['quality1', 'quality2'])

产生:

                   value
quality1 quality2       
A        D             1
         E            10
         F           100
B        D             2
         E            20
         F           200
C        D             3
         E            30
         F           300