获取 pandas 数据框的多列(笛卡尔积)的组合?

Get combinations of multiple columns (Cartesian product) of a pandas dataframe?

So I have a dataframe representing various model estimates for the likelihood of each of a group of candidates winning an election.

             Steve     John      
    Model1   0.327586  0.289474 
    Model2   0.322581  0.285714 
    Model3   0.303030  0.294118

我想要一个数据框来表示跨列的模型值的所有组合,即所有列的笛卡尔积。所以上面的会变成下面的。

             model Steve     value Steve    model John     value John     
    0        Model1          0.327586       Model1         0.289474
    1        Model1          0.327586       Model2         0.285714
    2        Model1          0.327586       Model3         0.294118
    3        Model2          0.322581       Model1         0.289474
    4        Model2          0.322581       Model2         0.285714
    5        Model2          0.322581       Model3         0.294118
    6        Model3          0.303030       Model1         0.289474
    7        Model3          0.303030       Model2         0.285714
    8        Model3          0.303030       Model3         0.294118

以上是简单的情况,但理论上我希望能够对 N 个模型和 M 个候选者执行此操作,从而得到一个具有 N^M 行和 2M 列的数据框(实际上 N < 20,米 < 6).

在寻找答案时,我看到了很多关于 itertools 模块的建议,但无法弄清楚如何在多个列表中获得所有组合(itertools.combinations 似乎只适用于在单个列表中查找所有组合)。

使用:

from  itertools import product

#get all combinations of all columns
a = product(*[zip(df.index, x) for x in df.T.values])
#create new columns names
cols = [c for x in df.columns for c in ('model_' + x, 'value_' + x)]
#flattening nested lists with DataFrame contructor
df1 = pd.DataFrame([[y for x in z for y in x] for z in a], columns=cols)
print (df1)
  model_Steve  value_Steve model_John  value_John
0      Model1     0.327586     Model1    0.289474
1      Model1     0.327586     Model2    0.285714
2      Model1     0.327586     Model3    0.294118
3      Model2     0.322581     Model1    0.289474
4      Model2     0.322581     Model2    0.285714
5      Model2     0.322581     Model3    0.294118
6      Model3     0.303030     Model1    0.289474
7      Model3     0.303030     Model2    0.285714
8      Model3     0.303030     Model3    0.294118

最好提供代码以便我们可以快速创建框架,而不仅仅是 table。您可以通过任何方式创建一个公共 key 并可以像交叉连接这样的数据库来获得最终结果。你可以一行完成,但我是一步一步做的。

import pandas as pd


df = pd.DataFrame({'model': ['model1', 'model2'],
                   'steve': ['a', 'b'],
                   'jhon': ['c', 'd']
                  })

# create a common key

df['key'] = 'xyz'

# create two seperate dataframe for self join
# but it is possible to use the direct operation (right side) in 
# inside of merge funciton

df_steve = df [['model', 'steve', 'key']]
df_jhon = df [['model', 'jhon', 'key']]

# self join    
pd.merge(df_steve, df_jhon, on='key', suffixes=('_steve', '_jhon')).drop('key', axis=1)

输出:

  model_steve steve model_jhon jhon
0      model1     a     model1    c
1      model1     a     model2    d
2      model2     b     model1    c
3      model2     b     model2    d

一班代码:

cross_df = pd.merge(df[['model', 'steve', 'key']], 
                    df[['model', 'jhon', 'key']], 
                    on='key', 
                    suffixes=('_steve', '_jhon')
                    ).drop('key', axis=1)

根据需要更改列名即可。