仅当第一个 return 不匹配时才在合并中合并
Merge inside the merge only if the first doesn't return a match
我有 3 个数据框(df1、df2 和 df3),主要数据框 (df1) 和另外两个数据框,其中包含 1 列,我想将它们带到主数据框。
示例 dfs:
df1 = {'Col_f': ['Georgia', 'Nevada', 'New York', 'Texas', 'Arizona'],
'Col_g': ['SUV', 'Coupe', 'Wagon', 'Crossover', 'Sedan']}
df1 = pd.DataFrame(df1)
df2 = {'Col_g': ['SUV', '4x4', 'Wagon', 'Truck', 'Sedan'],
'Objective': ['15%', '13%', '55%', '40.4%', '2.48%']}
df2 = pd.DataFrame(df2)
df3 = {'Col_f': ['Georgia', 'California', 'Pennsylvania', 'Texas', 'Arizona'],
'Objective': ['15%', '13%', '55%', '40.4%', '2.48%']}
df3 = pd.DataFrame(df3)
df1
df2
和df3
我正在使用以下代码:
df1_new = pd.merge(df1, df2, on = 'Col_g', how = 'left')
Which returns the following df:
df_new = {'Col_f': ['Georgia', 'California', 'Pennsylvania', 'Texas', 'Arizona'],
'Col_g': ['SUV', 'Coupe', 'Wagon', 'Crossover', 'Sedan'],
'Objective': ['15%', ' ', '55%', ' ', '2.48%']}
然后对于“Objective”列的两个空字符串,我想继续进行第二次合并(或者 excel vlookup,我猜是相同的),以填写那些空的。
我认为将用于第二次合并的代码:
df1_newer = pd.merge(df_new, df3, on = 'Col_f', how = 'left')
期望的最终输出。
df_newer = {'Col_f': ['Georgia', 'California', 'Pennsylvania', 'Texas', 'Arizona'],
'Col_g': ['SUV', '4x4', 'Wagon', 'Truck', 'Sedan'],
'Objective': ['15%', '13%', '55%', '40.4%', '2.48%']}
如有任何建议,我们将不胜感激!
合并三个dataframe后,可以使用mask
或np.where
条件赋值
df1_new = pd.merge(df1, df2, on='Col_g', how='left')['Objective']
df1_newer = pd.merge(df1, df3, on='Col_f', how='left')['Objective']
print(df1_new)
0 NaN
1 NaN
2 55%
3 NaN
4 2.48%
Name: Objective, dtype: object
print(df1_newer)
0 15%
1 NaN
2 NaN
3 40.4%
4 2.48%
Name: Objective, dtype: object
df1['Objective'] = df1_new.mask(df1_new.isna(), df1_newer)
#or
import numpy as np
df1['Objective'] = np.where(df1_new.isna(), df1_newer, df1_new)
print(df1)
Col_f Col_g Objective
0 Georgia Minivan 15%
1 Nevada Coupe NaN
2 New York Wagon 55%
3 Texas Crossover 40.4%
4 Arizona Sedan 2.48%
我有 3 个数据框(df1、df2 和 df3),主要数据框 (df1) 和另外两个数据框,其中包含 1 列,我想将它们带到主数据框。
示例 dfs:
df1 = {'Col_f': ['Georgia', 'Nevada', 'New York', 'Texas', 'Arizona'],
'Col_g': ['SUV', 'Coupe', 'Wagon', 'Crossover', 'Sedan']}
df1 = pd.DataFrame(df1)
df2 = {'Col_g': ['SUV', '4x4', 'Wagon', 'Truck', 'Sedan'],
'Objective': ['15%', '13%', '55%', '40.4%', '2.48%']}
df2 = pd.DataFrame(df2)
df3 = {'Col_f': ['Georgia', 'California', 'Pennsylvania', 'Texas', 'Arizona'],
'Objective': ['15%', '13%', '55%', '40.4%', '2.48%']}
df3 = pd.DataFrame(df3)
df1
和df3
我正在使用以下代码:
df1_new = pd.merge(df1, df2, on = 'Col_g', how = 'left')
Which returns the following df:
df_new = {'Col_f': ['Georgia', 'California', 'Pennsylvania', 'Texas', 'Arizona'],
'Col_g': ['SUV', 'Coupe', 'Wagon', 'Crossover', 'Sedan'],
'Objective': ['15%', ' ', '55%', ' ', '2.48%']}
然后对于“Objective”列的两个空字符串,我想继续进行第二次合并(或者 excel vlookup,我猜是相同的),以填写那些空的。
我认为将用于第二次合并的代码:
df1_newer = pd.merge(df_new, df3, on = 'Col_f', how = 'left')
期望的最终输出。
df_newer = {'Col_f': ['Georgia', 'California', 'Pennsylvania', 'Texas', 'Arizona'],
'Col_g': ['SUV', '4x4', 'Wagon', 'Truck', 'Sedan'],
'Objective': ['15%', '13%', '55%', '40.4%', '2.48%']}
如有任何建议,我们将不胜感激!
合并三个dataframe后,可以使用mask
或np.where
条件赋值
df1_new = pd.merge(df1, df2, on='Col_g', how='left')['Objective']
df1_newer = pd.merge(df1, df3, on='Col_f', how='left')['Objective']
print(df1_new)
0 NaN
1 NaN
2 55%
3 NaN
4 2.48%
Name: Objective, dtype: object
print(df1_newer)
0 15%
1 NaN
2 NaN
3 40.4%
4 2.48%
Name: Objective, dtype: object
df1['Objective'] = df1_new.mask(df1_new.isna(), df1_newer)
#or
import numpy as np
df1['Objective'] = np.where(df1_new.isna(), df1_newer, df1_new)
print(df1)
Col_f Col_g Objective
0 Georgia Minivan 15%
1 Nevada Coupe NaN
2 New York Wagon 55%
3 Texas Crossover 40.4%
4 Arizona Sedan 2.48%