使用 DataFrame cross join throw no common columns to perform merge on
Using DataFrame cross join throw no common columns to perform merge on
我想创建第三列作为我的列 A
和 B
之间的交叉连接的结果:
import pandas as pd
import numpy as np
df = pd.read_csv("data.csv", sep=",")
df
# A B
# 0 0 Yes
# 1 8 No
# 2 2 Yes
# 3 4 Maybe
# 4 6 NA
它们具有以下唯一值:
>>> df['A'].drop_duplicates()
0 0
2 8
41 4
119 2
1246 3
1808 1
Name: A, dtype: int64
>>> df['B'].drop_duplicates()
0 NA
2 Maybe
320 No
5575 Yes
Name: B, dtype: object
我想要一个包含所有交叉连接组合的 df['C'],因此我们应该在其中包含 6 * 4 = 24 个唯一值:
#Column C should have 6 * 4 classes:
(1,Yes)=1 (1,No)=6 (1, Maybe)=12 (1, NA)=18
(2,Yes)=2 (2,No)=7 (2, maybe)=13 ...
(3,Yes)=3 (3,No)=8 ...
(4,Yes)=4 (4,No)=9
(8,Yes)=5 ...
(0,Yes)=0
因此我们应该有以下内容:
Newdf
# A B C
# 0 0 Yes 0
# 1 8 No 9
# 2 2 Yes 2
# 3 4 Maybe 15
# 4 8 NA 22
使用这个方法,出现如下错误:
out = df.merge(df[['B']].drop_duplicates().merge(df['A'].drop_duplicates(),how='cross').assign(C=lambda x : x.index+1))
投掷:
"No common columns to perform merge on. "
pandas.errors.MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False
如有任何帮助,我们将不胜感激。
你为什么不用旧的 itertools:
from itertools import product
cats = list(product(df['A'].unique(), df['B'].unique()))
# merge with this
pd.DataFrame(cats, columns=['A','B']).assign(C=range(len(cats)))
示例数据的输出:
A B C
0 0 Yes 0
1 0 No 1
2 0 Maybe 2
3 0 NaN 3
4 8 Yes 4
5 8 No 5
6 8 Maybe 6
7 8 NaN 7
8 2 Yes 8
9 2 No 9
10 2 Maybe 10
11 2 NaN 11
12 4 Yes 12
13 4 No 13
14 4 Maybe 14
15 4 NaN 15
16 6 Yes 16
17 6 No 17
18 6 Maybe 18
19 6 NaN 19
我想创建第三列作为我的列 A
和 B
之间的交叉连接的结果:
import pandas as pd
import numpy as np
df = pd.read_csv("data.csv", sep=",")
df
# A B
# 0 0 Yes
# 1 8 No
# 2 2 Yes
# 3 4 Maybe
# 4 6 NA
它们具有以下唯一值:
>>> df['A'].drop_duplicates()
0 0
2 8
41 4
119 2
1246 3
1808 1
Name: A, dtype: int64
>>> df['B'].drop_duplicates()
0 NA
2 Maybe
320 No
5575 Yes
Name: B, dtype: object
我想要一个包含所有交叉连接组合的 df['C'],因此我们应该在其中包含 6 * 4 = 24 个唯一值:
#Column C should have 6 * 4 classes:
(1,Yes)=1 (1,No)=6 (1, Maybe)=12 (1, NA)=18
(2,Yes)=2 (2,No)=7 (2, maybe)=13 ...
(3,Yes)=3 (3,No)=8 ...
(4,Yes)=4 (4,No)=9
(8,Yes)=5 ...
(0,Yes)=0
因此我们应该有以下内容:
Newdf
# A B C
# 0 0 Yes 0
# 1 8 No 9
# 2 2 Yes 2
# 3 4 Maybe 15
# 4 8 NA 22
使用这个方法,出现如下错误:
out = df.merge(df[['B']].drop_duplicates().merge(df['A'].drop_duplicates(),how='cross').assign(C=lambda x : x.index+1))
投掷:
"No common columns to perform merge on. "
pandas.errors.MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False
如有任何帮助,我们将不胜感激。
你为什么不用旧的 itertools:
from itertools import product
cats = list(product(df['A'].unique(), df['B'].unique()))
# merge with this
pd.DataFrame(cats, columns=['A','B']).assign(C=range(len(cats)))
示例数据的输出:
A B C
0 0 Yes 0
1 0 No 1
2 0 Maybe 2
3 0 NaN 3
4 8 Yes 4
5 8 No 5
6 8 Maybe 6
7 8 NaN 7
8 2 Yes 8
9 2 No 9
10 2 Maybe 10
11 2 NaN 11
12 4 Yes 12
13 4 No 13
14 4 Maybe 14
15 4 NaN 15
16 6 Yes 16
17 6 No 17
18 6 Maybe 18
19 6 NaN 19