如何像这样合并两个列不相同的数据框? Python
How to merge two dataframes with not identical columns like this way? Python
我有两个数据框看起来像这样(真实的更大):
DF1:
Alliances_names
Value1
cgc inc/nshow ltd/noracle inc
500
steam/nsoap jv
NaN
saints bd
8
watrloo jv/ncgc inc/nflow inc
19
DF2:
Company
Number1
Number2
steam
15
y
soap jv
2000
n
cgc inc
4565
n
show ltd
1
n
flow inc
1111
y
watrloo jv
6756
n
我必须将这两个数据框与联盟和公司列合并。如果是联盟公司,我必须将此信息添加到行中。 (DF1中公司之间有/n分隔符)
结果应该是这样的:
Alliances_names
Value1
Company
Number1
Number2
cgc inc/nshowltd/noracle inc
500
cgc inc
4565
n
cgc inc/nshowltd/noracle inc
500
show ltd
1
n
steam/nsoap jv
NaN
steam
15
y
steam/nsoap jv
NaN
soap jv
2000
n
saints bd
8
NaN
NaN
NaN
watrloo jv/ncgc inc/nflow inc
19
watrloo jv
6756
n
watrloo jv/ncgc inc/nflow inc
19
cgc inc
4565
n
watrloo jv/ncgc inc/nflow inc
19
flow inc
1111
y
我需要为其中的每个公司复制联盟名称。
我试图将“联盟名称”中的公司分开,并在每个单元格中创建另一个包含公司列表的列,但“isin”并不能很好地处理它,而且我不能用重复的数据框。
预先感谢您的帮助!
df1["Company"] = df1["Alliances_names"].str.split("/n")
df1 = df1.explode("Company")
output = df1.merge(df2, on="Company", how="left")
>>> output
Alliances_names Value1 Company Number1 Number2
0 cgc inc/nshow ltd/noracle inc 500.0 cgc inc 4565.0 n
1 cgc inc/nshow ltd/noracle inc 500.0 show ltd 1.0 n
2 cgc inc/nshow ltd/noracle inc 500.0 oracle inc NaN NaN
3 steam/nsoap jv NaN steam 15.0 y
4 steam/nsoap jv NaN soap jv 2000.0 n
5 saints bd 8.0 saints bd NaN NaN
6 watrloo jv/ncgc inc/nflow inc 19.0 watrloo jv 6756.0 n
7 watrloo jv/ncgc inc/nflow inc 19.0 cgc inc 4565.0 n
8 watrloo jv/ncgc inc/nflow inc 19.0 flow inc 1111.0 y
编辑:
要仅保留所有 Alliances_names 都在 df2
中的行,您可以这样做:
output = output[output["Alliances_names"].str.split("/n").map(set(df2["Company"]).issuperset)]
>>> output
Alliances_names Value1 Company Number1 Number2
3 steam/nsoap jv NaN steam 15.0 y
4 steam/nsoap jv NaN soap jv 2000.0 n
6 watrloo jv/ncgc inc/nflow inc 19.0 watrloo jv 6756.0 n
7 watrloo jv/ncgc inc/nflow inc 19.0 cgc inc 4565.0 n
8 watrloo jv/ncgc inc/nflow inc 19.0 flow inc 1111.0 y
我有两个数据框看起来像这样(真实的更大):
DF1:
Alliances_names | Value1 |
---|---|
cgc inc/nshow ltd/noracle inc | 500 |
steam/nsoap jv | NaN |
saints bd | 8 |
watrloo jv/ncgc inc/nflow inc | 19 |
DF2:
Company | Number1 | Number2 |
---|---|---|
steam | 15 | y |
soap jv | 2000 | n |
cgc inc | 4565 | n |
show ltd | 1 | n |
flow inc | 1111 | y |
watrloo jv | 6756 | n |
我必须将这两个数据框与联盟和公司列合并。如果是联盟公司,我必须将此信息添加到行中。 (DF1中公司之间有/n分隔符)
结果应该是这样的:
Alliances_names | Value1 | Company | Number1 | Number2 |
---|---|---|---|---|
cgc inc/nshowltd/noracle inc | 500 | cgc inc | 4565 | n |
cgc inc/nshowltd/noracle inc | 500 | show ltd | 1 | n |
steam/nsoap jv | NaN | steam | 15 | y |
steam/nsoap jv | NaN | soap jv | 2000 | n |
saints bd | 8 | NaN | NaN | NaN |
watrloo jv/ncgc inc/nflow inc | 19 | watrloo jv | 6756 | n |
watrloo jv/ncgc inc/nflow inc | 19 | cgc inc | 4565 | n |
watrloo jv/ncgc inc/nflow inc | 19 | flow inc | 1111 | y |
我需要为其中的每个公司复制联盟名称。 我试图将“联盟名称”中的公司分开,并在每个单元格中创建另一个包含公司列表的列,但“isin”并不能很好地处理它,而且我不能用重复的数据框。 预先感谢您的帮助!
df1["Company"] = df1["Alliances_names"].str.split("/n")
df1 = df1.explode("Company")
output = df1.merge(df2, on="Company", how="left")
>>> output
Alliances_names Value1 Company Number1 Number2
0 cgc inc/nshow ltd/noracle inc 500.0 cgc inc 4565.0 n
1 cgc inc/nshow ltd/noracle inc 500.0 show ltd 1.0 n
2 cgc inc/nshow ltd/noracle inc 500.0 oracle inc NaN NaN
3 steam/nsoap jv NaN steam 15.0 y
4 steam/nsoap jv NaN soap jv 2000.0 n
5 saints bd 8.0 saints bd NaN NaN
6 watrloo jv/ncgc inc/nflow inc 19.0 watrloo jv 6756.0 n
7 watrloo jv/ncgc inc/nflow inc 19.0 cgc inc 4565.0 n
8 watrloo jv/ncgc inc/nflow inc 19.0 flow inc 1111.0 y
编辑:
要仅保留所有 Alliances_names 都在 df2
中的行,您可以这样做:
output = output[output["Alliances_names"].str.split("/n").map(set(df2["Company"]).issuperset)]
>>> output
Alliances_names Value1 Company Number1 Number2
3 steam/nsoap jv NaN steam 15.0 y
4 steam/nsoap jv NaN soap jv 2000.0 n
6 watrloo jv/ncgc inc/nflow inc 19.0 watrloo jv 6756.0 n
7 watrloo jv/ncgc inc/nflow inc 19.0 cgc inc 4565.0 n
8 watrloo jv/ncgc inc/nflow inc 19.0 flow inc 1111.0 y