通过 pandas 查找另一个 df,在 df 的 1 列内填充 NaN
Fill NaN's within 1 column of a df via lookup to another df via pandas
我看到了这个问题的各种版本,但其中 none 似乎符合我正在尝试做的事情:这是我的数据:
这是带有 NaN
s 的 df:
df = pd.DataFrame({"A": ["10023", "10040", np.nan, "12345", np.nan, np.nan, "10033", np.nan, np.nan],
"B": [",", "17,-6", "19,-2", "17,-5", "37,-5", ",", "9,-10", "19,-2", "2,-5"],
"C": ["small", "large", "large", "small", "small", "large", "small", "small", "large"]})
A B C
0 10023 , small
1 10040 17,-6 large
2 NaN 19,-2 large
3 12345 17,-5 small
4 NaN 37,-5 small
5 NaN , large
6 10033 9,-10 small
7 NaN 19,-2 small
8 NaN 2,-5 large
接下来我有一个名为 df2
:
的查找 df
df2 = pd.DataFrame({"B": ['17,-5', '19,-2', '37,-5', '9,-10'],
"A": ["10040", "54321", "12345", "10033"]})
B A
0 17,-5 10040
1 19,-2 54321
2 37,-5 12345
3 9,-10 10033
我想通过查找 df2.B
列并返回 df2.A
来填写 df
上 A
列的 NaN
,这样结果 dfr
看起来像这样:
A B C
0 10023 , small
1 10040 17,-6 large
2 54321 19,-2 large
3 10040 17,-5 small
4 12345 37,-5 small
5 NaN , large
6 10033 9,-10 small
7 54321 19,-2 small
8 NaN 2,-5 large
重要注意事项:
df
没有匹配的索引
df.A
和df2.A
的内容不唯一()
df2
的行组成了唯一的对。
- 假设有更多列,未显示,有
NaN
s。
使用 pandas,df
上感兴趣的行将通过(我认为)找到:df.loc[df['A'].isnull(),]
。 答案似乎很有希望,但我不清楚该示例中的 df1
来自哪里。我的实际数据集比这大得多,我将不得不以这种方式替换几列。
只需使用 np.where
df.A=np.where(df.A.isnull(),df.B.map(df2.set_index('B').A),df.A)
df
Out[149]:
A B C
0 10023 , small
1 10040 17,-6 large
2 54321 19,-2 large
3 12345 17,-5 small
4 12345 37,-5 small
5 NaN , large
6 10033 9,-10 small
7 54321 19,-2 small
8 NaN 2,-5 large
文本的map
方法在速度上会更快,但是这里有另一种方法可以解决这个问题,只是为了方便和知识
你可以使用pd.merge
,因为这基本上是一个join
问题。
合并后,我们填充并删除不需要的列。
df_final = pd.merge(df, df2, on='B', how='left', suffixes=['_1','_2'])
df_final['A'] = df_final.A_1.fillna(df_final.A_2)
df_final.drop(['A_1', 'A_2'], axis=1, inplace=True)
print(df_final)
B C A
0 , small 10023
1 17,-6 large 10040
2 19,-2 large 54321
3 17,-5 small 12345
4 37,-5 small 12345
5 , large NaN
6 9,-10 small 10033
7 19,-2 small 54321
8 2,-5 large NaN
我看到了这个问题的各种版本,但其中 none 似乎符合我正在尝试做的事情:这是我的数据:
这是带有 NaN
s 的 df:
df = pd.DataFrame({"A": ["10023", "10040", np.nan, "12345", np.nan, np.nan, "10033", np.nan, np.nan],
"B": [",", "17,-6", "19,-2", "17,-5", "37,-5", ",", "9,-10", "19,-2", "2,-5"],
"C": ["small", "large", "large", "small", "small", "large", "small", "small", "large"]})
A B C
0 10023 , small
1 10040 17,-6 large
2 NaN 19,-2 large
3 12345 17,-5 small
4 NaN 37,-5 small
5 NaN , large
6 10033 9,-10 small
7 NaN 19,-2 small
8 NaN 2,-5 large
接下来我有一个名为 df2
:
df2 = pd.DataFrame({"B": ['17,-5', '19,-2', '37,-5', '9,-10'],
"A": ["10040", "54321", "12345", "10033"]})
B A
0 17,-5 10040
1 19,-2 54321
2 37,-5 12345
3 9,-10 10033
我想通过查找 df2.B
列并返回 df2.A
来填写 df
上 A
列的 NaN
,这样结果 dfr
看起来像这样:
A B C
0 10023 , small
1 10040 17,-6 large
2 54321 19,-2 large
3 10040 17,-5 small
4 12345 37,-5 small
5 NaN , large
6 10033 9,-10 small
7 54321 19,-2 small
8 NaN 2,-5 large
重要注意事项:
df
没有匹配的索引df.A
和df2.A
的内容不唯一()df2
的行组成了唯一的对。- 假设有更多列,未显示,有
NaN
s。
使用 pandas,df
上感兴趣的行将通过(我认为)找到:df.loc[df['A'].isnull(),]
。 df1
来自哪里。我的实际数据集比这大得多,我将不得不以这种方式替换几列。
只需使用 np.where
df.A=np.where(df.A.isnull(),df.B.map(df2.set_index('B').A),df.A)
df
Out[149]:
A B C
0 10023 , small
1 10040 17,-6 large
2 54321 19,-2 large
3 12345 17,-5 small
4 12345 37,-5 small
5 NaN , large
6 10033 9,-10 small
7 54321 19,-2 small
8 NaN 2,-5 large
文本的map
方法在速度上会更快,但是这里有另一种方法可以解决这个问题,只是为了方便和知识
你可以使用pd.merge
,因为这基本上是一个join
问题。
合并后,我们填充并删除不需要的列。
df_final = pd.merge(df, df2, on='B', how='left', suffixes=['_1','_2'])
df_final['A'] = df_final.A_1.fillna(df_final.A_2)
df_final.drop(['A_1', 'A_2'], axis=1, inplace=True)
print(df_final)
B C A
0 , small 10023
1 17,-6 large 10040
2 19,-2 large 54321
3 17,-5 small 12345
4 37,-5 small 12345
5 , large NaN
6 9,-10 small 10033
7 19,-2 small 54321
8 2,-5 large NaN