如何根据 pandas 中另一列中的空值填充列
How to fill a columns based on the null values in another column in pandas
我有一个如下所示的数据框:
Emp_code Leave_applied Leave_approved
0 15-Jan-2021 15-Jan-2021
2 18-Jan-2021 15-Jan-2021
3 20-Jan-2021 np.nan
4 15-Jan-2021 18-Jan-2021
我需要根据以下条件添加一个新列作为休假类型:
如果 leave_applied 大于 leave_approved,leave_type=未计划
如果 leave_applied 小于 leave_approved,leave_type=计划
if leave_applied == leave_approved, leave_type=计划
if leave_approved == np.nan then leave_type= 缺失数据
Required output
Emp_code Leave_applied Leave_approved Leave type
0 15-Jan-2021 15-Jan-2021
Planned
2 18-Jan-2021 15-Jan-2021 unplanned
3 20-Jan-2021 np.nan missing data
4 15-Jan-2021 18-Jan-2021 planned
我试过
df[leave_type] = np.where(df['Leave_applied'] > df['Leave_approved'],unplanned,
(np.where(df['Leave_approved'] == np.nan, 'Missing_data', 'Planned)))
代码运行但我在我的数据框中找不到任何值作为缺失数据。
首先通过 to_datetime
and for test missing values use Series.isna
:
将值转换为日期时间
df['Leave_applied'] = pd.to_datetime(df['Leave_applied'])
df['Leave_approved'] = pd.to_datetime(df['Leave_approved'])
df['leave_type'] = np.where(df['Leave_applied'] > df['Leave_approved'],'unplanned',
(np.where(df['Leave_approved'].isna(), 'Missing_data', 'Planned')))
print (df)
Emp_code Leave_applied Leave_approved leave_type
0 0 2021-01-15 2021-01-15 Planned
1 2 2021-01-18 2021-01-15 unplanned
2 3 2021-01-20 NaT Missing_data
3 4 2021-01-15 2021-01-18 Planned
或使用numpy.select
:
df['leave_type'] = np.select([df['Leave_approved'].isna(),
df['Leave_applied'] > df['Leave_approved']],
['Missing_data', 'unplanned'], 'Planned')
你可以试试np.select
。想法是将 NaT
与任何日期进行比较是 False
,因此将其保留为 default
df['Leave_applied'] = pd.to_datetime(df['Leave_applied'], errors='coerce')
df['Leave_approved'] = pd.to_datetime(df['Leave_approved'], errors='coerce')
df['Leave type'] = np.select(
[df['Leave_applied'] > df['Leave_approved'],
df['Leave_applied'] <= df['Leave_approved'],
],
['unplanned',
'planned',
],
default='missing data'
)
print(df)
Emp_code Leave_applied Leave_approved Leave type
0 0 2021-01-15 2021-01-15 planned
1 2 2021-01-18 2021-01-15 unplanned
2 3 2021-01-20 NaT missing data
3 4 2021-01-15 2021-01-18 planned
我有一个如下所示的数据框:
Emp_code Leave_applied Leave_approved
0 15-Jan-2021 15-Jan-2021
2 18-Jan-2021 15-Jan-2021
3 20-Jan-2021 np.nan
4 15-Jan-2021 18-Jan-2021
我需要根据以下条件添加一个新列作为休假类型:
如果 leave_applied 大于 leave_approved,leave_type=未计划
如果 leave_applied 小于 leave_approved,leave_type=计划
if leave_applied == leave_approved, leave_type=计划
if leave_approved == np.nan then leave_type= 缺失数据
Required output
Emp_code Leave_applied Leave_approved Leave type
0 15-Jan-2021 15-Jan-2021
Planned
2 18-Jan-2021 15-Jan-2021 unplanned
3 20-Jan-2021 np.nan missing data
4 15-Jan-2021 18-Jan-2021 planned
我试过
df[leave_type] = np.where(df['Leave_applied'] > df['Leave_approved'],unplanned,
(np.where(df['Leave_approved'] == np.nan, 'Missing_data', 'Planned)))
代码运行但我在我的数据框中找不到任何值作为缺失数据。
首先通过 to_datetime
and for test missing values use Series.isna
:
df['Leave_applied'] = pd.to_datetime(df['Leave_applied'])
df['Leave_approved'] = pd.to_datetime(df['Leave_approved'])
df['leave_type'] = np.where(df['Leave_applied'] > df['Leave_approved'],'unplanned',
(np.where(df['Leave_approved'].isna(), 'Missing_data', 'Planned')))
print (df)
Emp_code Leave_applied Leave_approved leave_type
0 0 2021-01-15 2021-01-15 Planned
1 2 2021-01-18 2021-01-15 unplanned
2 3 2021-01-20 NaT Missing_data
3 4 2021-01-15 2021-01-18 Planned
或使用numpy.select
:
df['leave_type'] = np.select([df['Leave_approved'].isna(),
df['Leave_applied'] > df['Leave_approved']],
['Missing_data', 'unplanned'], 'Planned')
你可以试试np.select
。想法是将 NaT
与任何日期进行比较是 False
,因此将其保留为 default
df['Leave_applied'] = pd.to_datetime(df['Leave_applied'], errors='coerce')
df['Leave_approved'] = pd.to_datetime(df['Leave_approved'], errors='coerce')
df['Leave type'] = np.select(
[df['Leave_applied'] > df['Leave_approved'],
df['Leave_applied'] <= df['Leave_approved'],
],
['unplanned',
'planned',
],
default='missing data'
)
print(df)
Emp_code Leave_applied Leave_approved Leave type
0 0 2021-01-15 2021-01-15 planned
1 2 2021-01-18 2021-01-15 unplanned
2 3 2021-01-20 NaT missing data
3 4 2021-01-15 2021-01-18 planned