合并两个 pandas 数据帧,每行多列
Merge two pandas dataframes with multiple columns per row
我有一个如下所示的数据框“df1”:
company id
company name
dealid_1
dealyear_1
dealid_2
dealyear_2
C1
ABC
C2
DEF
我想用另一个数据框“df2”中的数据填充空白单元格,如下所示:
deal id
deal year
company id
company name
D1
2010
C1
ABC
D2
2015
C1
ABC
D3
2012
C2
DEF
D4
2017
C2
DEF
所以“df1”的最终结果应该如下:
company id
company name
dealid_1
dealyear_1
dealid_2
dealyear_2
C1
ABC
D1
2010
D2
2015
C2
DEF
D3
2012
D4
2017
谁能帮我解决这个问题?
谢谢!
使用GroupBy.cumcount
for counter, pivoting by DataFrame.pivot
with sorting second level of MultiIndex
by DataFrame.sort_index
,最后压平MultiIndex
:
df3 = (df2.assign(g = df2.groupby(['company id','company name']).cumcount())
.pivot(index=['company id','company name'], columns='g')
.sort_index(axis=1, level=1))
df3.columns = df3.columns.map(lambda x: f'{x[0]}_{x[1] + 1}')
print (df3.reset_index())
company id company name deal id_1 deal year_1 deal id_2 deal year_2
0 C1 ABC D1 2010 D2 2015
1 C2 DEF D3 2012 D4 2017
要与第一个 df
合并,请使用:
df = df1[['company id', 'company name']].join(df3, on=['company id', 'company name'])
您可以使用:
df3 = (df2.drop(columns='company name')
.assign(col=df2.groupby('company name').cumcount().add(1).astype(str))
.pivot(index='company id', columns='col')
)
df3.columns = df3.columns.map('_'.join)
out = df1[['company id', 'company name']].merge(df3, on='company id')
输出:
company id company name deal id_1 deal id_2 deal year_1 deal year_2
0 C1 ABC D1 D2 2010 2015
1 C2 DEF D3 D4 2012 2017
我有一个如下所示的数据框“df1”:
company id | company name | dealid_1 | dealyear_1 | dealid_2 | dealyear_2 |
---|---|---|---|---|---|
C1 | ABC | ||||
C2 | DEF |
我想用另一个数据框“df2”中的数据填充空白单元格,如下所示:
deal id | deal year | company id | company name |
---|---|---|---|
D1 | 2010 | C1 | ABC |
D2 | 2015 | C1 | ABC |
D3 | 2012 | C2 | DEF |
D4 | 2017 | C2 | DEF |
所以“df1”的最终结果应该如下:
company id | company name | dealid_1 | dealyear_1 | dealid_2 | dealyear_2 |
---|---|---|---|---|---|
C1 | ABC | D1 | 2010 | D2 | 2015 |
C2 | DEF | D3 | 2012 | D4 | 2017 |
谁能帮我解决这个问题?
谢谢!
使用GroupBy.cumcount
for counter, pivoting by DataFrame.pivot
with sorting second level of MultiIndex
by DataFrame.sort_index
,最后压平MultiIndex
:
df3 = (df2.assign(g = df2.groupby(['company id','company name']).cumcount())
.pivot(index=['company id','company name'], columns='g')
.sort_index(axis=1, level=1))
df3.columns = df3.columns.map(lambda x: f'{x[0]}_{x[1] + 1}')
print (df3.reset_index())
company id company name deal id_1 deal year_1 deal id_2 deal year_2
0 C1 ABC D1 2010 D2 2015
1 C2 DEF D3 2012 D4 2017
要与第一个 df
合并,请使用:
df = df1[['company id', 'company name']].join(df3, on=['company id', 'company name'])
您可以使用:
df3 = (df2.drop(columns='company name')
.assign(col=df2.groupby('company name').cumcount().add(1).astype(str))
.pivot(index='company id', columns='col')
)
df3.columns = df3.columns.map('_'.join)
out = df1[['company id', 'company name']].merge(df3, on='company id')
输出:
company id company name deal id_1 deal id_2 deal year_1 deal year_2
0 C1 ABC D1 D2 2010 2015
1 C2 DEF D3 D4 2012 2017