使用 DataFrame cross join throw no common columns to perform merge on

Using DataFrame cross join throw no common columns to perform merge on

我想创建第三列作为我的列 AB 之间的交叉连接的结果:

import pandas as pd
import numpy as np
df = pd.read_csv("data.csv", sep=",")
df
#    A    B    
# 0  0  Yes 
# 1  8   No 
# 2  2  Yes 
# 3  4  Maybe
# 4  6  NA

它们具有以下唯一值:

>>> df['A'].drop_duplicates()
0       0
2       8
41      4
119     2
1246    3
1808    1
Name: A, dtype: int64

>>> df['B'].drop_duplicates()
                
0              NA
2           Maybe
320            No
5575          Yes
Name: B, dtype: object

我想要一个包含所有交叉连接组合的 df['C'],因此我们应该在其中包含 6 * 4 = 24 个唯一值:

#Column C should have 6 * 4 classes:

(1,Yes)=1  (1,No)=6  (1, Maybe)=12  (1, NA)=18
(2,Yes)=2  (2,No)=7  (2, maybe)=13    ...
(3,Yes)=3  (3,No)=8  ...
(4,Yes)=4  (4,No)=9
(8,Yes)=5   ...
(0,Yes)=0

因此我们应该有以下内容:

Newdf
#    A    B    C  
# 0  0  Yes    0
# 1  8   No    9
# 2  2  Yes    2
# 3  4  Maybe  15
# 4  8  NA     22

使用这个方法,出现如下错误:

out = df.merge(df[['B']].drop_duplicates().merge(df['A'].drop_duplicates(),how='cross').assign(C=lambda x : x.index+1))

投掷:

"No common columns to perform merge on. "
pandas.errors.MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False

如有任何帮助,我们将不胜感激。

你为什么不用旧的 itertools:

from itertools import product   
cats = list(product(df['A'].unique(), df['B'].unique()))

# merge with this
pd.DataFrame(cats, columns=['A','B']).assign(C=range(len(cats)))

示例数据的输出:

    A      B   C
0   0    Yes   0
1   0     No   1
2   0  Maybe   2
3   0    NaN   3
4   8    Yes   4
5   8     No   5
6   8  Maybe   6
7   8    NaN   7
8   2    Yes   8
9   2     No   9
10  2  Maybe  10
11  2    NaN  11
12  4    Yes  12
13  4     No  13
14  4  Maybe  14
15  4    NaN  15
16  6    Yes  16
17  6     No  17
18  6  Maybe  18
19  6    NaN  19