如何根据 pandas 中另一列中的空值填充列

How to fill a columns based on the null values in another column in pandas

我有一个如下所示的数据框:

Emp_code Leave_applied  Leave_approved
0         15-Jan-2021    15-Jan-2021
2         18-Jan-2021    15-Jan-2021
3         20-Jan-2021       np.nan
4         15-Jan-2021    18-Jan-2021

我需要根据以下条件添加一个新列作为休假类型:

如果 leave_applied 大于 leave_approved,leave_type=未计划

如果 leave_applied 小于 leave_approved,leave_type=计划

if leave_applied == leave_approved, leave_type=计划

if leave_approved == np.nan then leave_type= 缺失数据


Required output

Emp_code Leave_applied  Leave_approved Leave type 

0   15-Jan-2021    15-Jan-2021  
 Planned

2   18-Jan-2021    15-Jan-2021 unplanned
3   20-Jan-2021       np.nan    missing data
4   15-Jan-2021    18-Jan-2021 planned 

我试过


df[leave_type] = np.where(df['Leave_applied'] > df['Leave_approved'],unplanned,
(np.where(df['Leave_approved'] == np.nan, 'Missing_data', 'Planned))) 

代码运行但我在我的数据框中找不到任何值作为缺失数据。

首先通过 to_datetime and for test missing values use Series.isna:

将值转换为日期时间
df['Leave_applied'] = pd.to_datetime(df['Leave_applied'])
df['Leave_approved'] = pd.to_datetime(df['Leave_approved'])

df['leave_type'] = np.where(df['Leave_applied'] > df['Leave_approved'],'unplanned',
                   (np.where(df['Leave_approved'].isna(), 'Missing_data', 'Planned'))) 
          
print (df)
   Emp_code Leave_applied Leave_approved    leave_type
0         0    2021-01-15     2021-01-15       Planned
1         2    2021-01-18     2021-01-15     unplanned
2         3    2021-01-20            NaT  Missing_data
3         4    2021-01-15     2021-01-18       Planned

或使用numpy.select:

df['leave_type'] = np.select([df['Leave_approved'].isna(),
                              df['Leave_applied'] > df['Leave_approved']],
                             ['Missing_data', 'unplanned'], 'Planned') 
          

你可以试试np.select。想法是将 NaT 与任何日期进行比较是 False,因此将其保留为 default

df['Leave_applied'] = pd.to_datetime(df['Leave_applied'], errors='coerce')
df['Leave_approved'] = pd.to_datetime(df['Leave_approved'], errors='coerce')

df['Leave type'] = np.select(
    [df['Leave_applied'] > df['Leave_approved'],
     df['Leave_applied'] <= df['Leave_approved'],
     ],
    ['unplanned',
     'planned',
     ],
    default='missing data'
)
print(df)

   Emp_code Leave_applied Leave_approved    Leave type
0         0    2021-01-15     2021-01-15       planned
1         2    2021-01-18     2021-01-15     unplanned
2         3    2021-01-20            NaT  missing data
3         4    2021-01-15     2021-01-18       planned