回填和正向填充 NaN 和 Zeros
Backfilling and Forwardfilling NaNs and Zeros
我正在尝试back/forward填写员工的工作经验(年)。我想要实现的是:
员工 200
2019 - 3 年、2018 - 2 年、2017 - 1 年
员工 300
保留为楠
员工 400
2018 - 3 岁,2017 - 2 岁
员工 500
2018 - 6 岁,2017 - 5 岁,2016 - 4 岁
我真的很难让它以 -1 (+1) 的增量回填(前填)。如果 non-NaN/zero 值位于员工 500 的中间,则更棘手。
df_test = pd.DataFrame({'DeptID':[0,0,0,1,1,1,2,2,2],
'Employee':[200, 200, 200, 300, 400, 400, 500, 500, 500],
'Year':[2017, 2018, 2019, 2016, 2017, 2018, 2016, 2017, 2018],
'Experience':[np.nan , np.nan, 3, np.nan, 2, np.nan, 0, 5, 0]
})
假设每个员工都有一个非零和 non-nan 经验,试试这个
df_test = pd.DataFrame({'DeptID':[0,0,0,1,1,1,2,2,2],
'Employee':[200, 200, 200, 300, 400, 400, 500, 500, 500],
'Year':[2017, 2018, 2019, 2016, 2017, 2018, 2016, 2017, 2018],
'Experience':[np.nan , np.nan, 3, np.nan, 2, np.nan, 0, 5, 0]
})
# find the last nonzero, non-nan value for each employee
nonzero = df_test[df_test.Experience.ne(0) & df_test.Experience.notna()].drop_duplicates('Employee', keep='last').reset_index().set_index('Employee')
# map the difference between experience and index of the nonzero value of the employees to employee column
# add it to index
df_test['Experience'] = df_test.index + df_test.Employee.map(nonzero.Experience - nonzero['index'])
df_test
我正在尝试back/forward填写员工的工作经验(年)。我想要实现的是:
员工 200
2019 - 3 年、2018 - 2 年、2017 - 1 年
员工 300
保留为楠
员工 400
2018 - 3 岁,2017 - 2 岁
员工 500
2018 - 6 岁,2017 - 5 岁,2016 - 4 岁
我真的很难让它以 -1 (+1) 的增量回填(前填)。如果 non-NaN/zero 值位于员工 500 的中间,则更棘手。
df_test = pd.DataFrame({'DeptID':[0,0,0,1,1,1,2,2,2],
'Employee':[200, 200, 200, 300, 400, 400, 500, 500, 500],
'Year':[2017, 2018, 2019, 2016, 2017, 2018, 2016, 2017, 2018],
'Experience':[np.nan , np.nan, 3, np.nan, 2, np.nan, 0, 5, 0]
})
假设每个员工都有一个非零和 non-nan 经验,试试这个
df_test = pd.DataFrame({'DeptID':[0,0,0,1,1,1,2,2,2],
'Employee':[200, 200, 200, 300, 400, 400, 500, 500, 500],
'Year':[2017, 2018, 2019, 2016, 2017, 2018, 2016, 2017, 2018],
'Experience':[np.nan , np.nan, 3, np.nan, 2, np.nan, 0, 5, 0]
})
# find the last nonzero, non-nan value for each employee
nonzero = df_test[df_test.Experience.ne(0) & df_test.Experience.notna()].drop_duplicates('Employee', keep='last').reset_index().set_index('Employee')
# map the difference between experience and index of the nonzero value of the employees to employee column
# add it to index
df_test['Experience'] = df_test.index + df_test.Employee.map(nonzero.Experience - nonzero['index'])
df_test