使用同一数据帧的特定列作为参考同时从多列填充 NaN 值的最佳方法
Best way to fill NaN values from multiple columns at the same time using specific columns of the same dataframe as reference
示例:
DF = pd.DataFrame({'A': [0, 0, np.NaN, 0 , np.NaN, 0 , 0 , 0 ],
'B': [1, 1, np.NaN, 1 , np.NaN, 1 , 1 , 1 ],
'C': [8, 8, np.NaN, 8 , np.NaN, np.NaN, 8 , 8 ],
'D': [2, 2, 2 , np.NaN, np.NaN, 2 , np.NaN, np.NaN],
'E': [3, 3, 3 , np.NaN, np.NaN, 3 , np.NaN, np.NaN]})
我想要的预期结果是尽可能填充A列和B列,即:
1) If DF['A'] line is NaN, it should get the correspondent DF['D'] line
2) If DF['B'] line is NaN, it should get the correspondent DF['E'] line
3) DF['C'] shall remain as it is
我正在尝试:
DF[['A', 'B']] = DF[['A','B']].fillna(DF[['D','E']])
但它似乎只有在有两个具有相同列名的不同数据框时才有效。我可以将 DF 拆分为 DF1 和 DF2,将 DF2['D'] 重命名为 A 并将 DF2['E'] 重命名为 B 并执行:
DF1[['A', 'B']] = DF1[['A','B']].fillna(DF2[['A','B']])
但我认为这不是最好的方法。有什么想法吗?
实际数据集有300万行,能得到最有效的解决方案就好了:)
谢谢!! :)
使用 np.where
是一个很好的选择,因为它适用于底层的 numpy 数组:
DF[['A','B']] = np.where(DF[['A','B']].isna(), DF[['D','E']], DF[['A','B']])
输出:
A B C D E
0 0.0 1.0 8.0 2.0 3.0
1 0.0 1.0 8.0 2.0 3.0
2 2.0 3.0 NaN 2.0 3.0
3 0.0 1.0 8.0 NaN NaN
4 NaN NaN NaN NaN NaN
5 0.0 1.0 NaN 2.0 3.0
6 0.0 1.0 8.0 NaN NaN
7 0.0 1.0 8.0 NaN NaN
示例:
DF = pd.DataFrame({'A': [0, 0, np.NaN, 0 , np.NaN, 0 , 0 , 0 ],
'B': [1, 1, np.NaN, 1 , np.NaN, 1 , 1 , 1 ],
'C': [8, 8, np.NaN, 8 , np.NaN, np.NaN, 8 , 8 ],
'D': [2, 2, 2 , np.NaN, np.NaN, 2 , np.NaN, np.NaN],
'E': [3, 3, 3 , np.NaN, np.NaN, 3 , np.NaN, np.NaN]})
我想要的预期结果是尽可能填充A列和B列,即:
1) If DF['A'] line is NaN, it should get the correspondent DF['D'] line
2) If DF['B'] line is NaN, it should get the correspondent DF['E'] line
3) DF['C'] shall remain as it is
我正在尝试:
DF[['A', 'B']] = DF[['A','B']].fillna(DF[['D','E']])
但它似乎只有在有两个具有相同列名的不同数据框时才有效。我可以将 DF 拆分为 DF1 和 DF2,将 DF2['D'] 重命名为 A 并将 DF2['E'] 重命名为 B 并执行:
DF1[['A', 'B']] = DF1[['A','B']].fillna(DF2[['A','B']])
但我认为这不是最好的方法。有什么想法吗?
实际数据集有300万行,能得到最有效的解决方案就好了:)
谢谢!! :)
使用 np.where
是一个很好的选择,因为它适用于底层的 numpy 数组:
DF[['A','B']] = np.where(DF[['A','B']].isna(), DF[['D','E']], DF[['A','B']])
输出:
A B C D E
0 0.0 1.0 8.0 2.0 3.0
1 0.0 1.0 8.0 2.0 3.0
2 2.0 3.0 NaN 2.0 3.0
3 0.0 1.0 8.0 NaN NaN
4 NaN NaN NaN NaN NaN
5 0.0 1.0 NaN 2.0 3.0
6 0.0 1.0 8.0 NaN NaN
7 0.0 1.0 8.0 NaN NaN