从两个大列表的笛卡尔积创建一个 pandas DataFrame
Create a pandas DataFrame from a Cartesian product of two large lists
我正在寻找最简单的方法来从其他两个创建数据框,使其包含它们元素的所有组合。
例如我们有这两个数据框:
list1 = ["A", "B", "C", "D", "E"]
list2 = ["x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8"]
df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
结果必须是:
0 1
0 A x1
1 A x2
2 A x3
3 A x4
4 A x5
5 A x6
6 A x7
7 A x8
8 B x1
9 B x2
我尝试从列表中进行组合,它适用于小列表但不适用于大列表。
谢谢
list1 = ["A", "B", "C", "D", "E"]
list2 = ["x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8"]
df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
df1['key'] = 0
df2['key'] = 0
print( df1.merge(df2, on='key', how='outer').drop(columns='key') )
打印:
0_x 0_y
0 A x1
1 A x2
2 A x3
3 A x4
4 A x5
5 A x6
6 A x7
7 A x8
8 B x1
9 B x2
...
您想将 df1
中的每个元素与 df2
中的所有元素相连接。
您可以使用 df.merge
:
In [1820]: df1['tmp'] = 1 ## Create a dummy key in df1
In [1821]: df2['tmp'] = 1 ## Create a dummy key in df2
## Merge both frames on `tmp`
In [1824]: df1.merge(df2, on='tmp').drop('tmp', 1).rename(columns={'0_x': '0', '0_y':'1'})
Out[1824]:
0 1
0 A x1
1 A x2
2 A x3
3 A x4
4 A x5
5 A x6
6 A x7
7 A x8
8 B x1
9 B x2
10 B x3
11 B x4
12 B x5
13 B x6
14 B x7
15 B x8
16 C x1
17 C x2
18 C x3
...
...
您可以使用 itertools.product
:
import itertools
import pandas as pd
list1 = ["A", "B", "C", "D", "E"]
list2 = ["x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8"]
result = pd.DataFrame(list(itertools.product(list1, list2)))
我正在寻找最简单的方法来从其他两个创建数据框,使其包含它们元素的所有组合。 例如我们有这两个数据框:
list1 = ["A", "B", "C", "D", "E"]
list2 = ["x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8"]
df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
结果必须是:
0 1
0 A x1
1 A x2
2 A x3
3 A x4
4 A x5
5 A x6
6 A x7
7 A x8
8 B x1
9 B x2
我尝试从列表中进行组合,它适用于小列表但不适用于大列表。 谢谢
list1 = ["A", "B", "C", "D", "E"]
list2 = ["x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8"]
df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
df1['key'] = 0
df2['key'] = 0
print( df1.merge(df2, on='key', how='outer').drop(columns='key') )
打印:
0_x 0_y
0 A x1
1 A x2
2 A x3
3 A x4
4 A x5
5 A x6
6 A x7
7 A x8
8 B x1
9 B x2
...
您想将 df1
中的每个元素与 df2
中的所有元素相连接。
您可以使用 df.merge
:
In [1820]: df1['tmp'] = 1 ## Create a dummy key in df1
In [1821]: df2['tmp'] = 1 ## Create a dummy key in df2
## Merge both frames on `tmp`
In [1824]: df1.merge(df2, on='tmp').drop('tmp', 1).rename(columns={'0_x': '0', '0_y':'1'})
Out[1824]:
0 1
0 A x1
1 A x2
2 A x3
3 A x4
4 A x5
5 A x6
6 A x7
7 A x8
8 B x1
9 B x2
10 B x3
11 B x4
12 B x5
13 B x6
14 B x7
15 B x8
16 C x1
17 C x2
18 C x3
...
...
您可以使用 itertools.product
:
import itertools
import pandas as pd
list1 = ["A", "B", "C", "D", "E"]
list2 = ["x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8"]
result = pd.DataFrame(list(itertools.product(list1, list2)))