根据两列用另一个数据框填充一个数据框中的 NA 值

Filling NA values in one dataframe by another based on two columns

当有多列要匹配时(在本例中为城市和房间),使用另一个数据帧填充 NA 值的最有效方法是什么?

要组合的示例数据帧和结果数据帧:

import pandas as pd
import numpy as np

d1 = {'city' : ['New York', 'Shanghai', 'Boston', 'Shanghai', 
'Shanghai'],
'rooms': ["1","2","3","2","2"], 'floor': ["4","5","6","10","8"], 'rent': 
 [500, np.nan, 1500, 2000, np.nan]}


d2 = {'city' : ['Shanghai'],
'rooms': ["2"], 'rent': [1000]}

df1 = pd.DataFrame(data = d1)
df2 = pd.DataFrame(data = d2)

result = {'city' : ['New York', 'Shanghai','Boston', 'Shanghai', 
'Shanghai'],
'rooms': ["1","2","3","2","2"], 'floor': ["4","5","6","10","8"], 'rent': 
[500, 1000, 1500, 2000, 1000]}

result_df = pd.DataFrame(data = result)

将两列的索引设置为对齐,并填写所需的列。在这种情况下,公共列是 cityrooms:

cols = ['city', 'rooms']

df1设置索引:

df1 = df1.set_index(cols)

df2设置索引:

df2 = df2.set_index(cols).rent # make it a Series

用 df2 填充 df1 并重置索引(索引为 good/useful):

df1.fillna({"rent": df2}).reset_index()

       city rooms floor    rent
0  New York     1     4   500.0
1  Shanghai     2     5  1000.0
2    Boston     3     6  1500.0
3  Shanghai     2    10  2000.0
4  Shanghai     2     8  1000.0

请注意,只有当来自 df2 的数据是唯一的时,这才有效