根据多个 DateTime 比较创建组
Create groups based on multiple DateTime comparison
我正在尝试创建一个基于列,并根据一个日期列与其他三个日期列的比较,用一个值填充它。
DataFrame df
的示例如下所示。所有显示的日期都已转换为 pd.to_datetime
,这导致了一些 NaT
值,因为个人尚未进步
1st_date 2nd_date 3rd_date action_date
2015-10-05 NaT NaT 2015-12-03
2015-02-27 2015-03-14 2015-03-15 2015-04-08
2015-03-07 2015-03-27 2015-03-28 2015-03-27
2015-01-05 2015-01-20 2015-01-21 2015-05-20
2015-01-05 2015-01-20 2015-01-21 2015-09-16
2015-05-23 2015-06-18 2015-06-19 2015-07-01
2015-03-03 NaT NaT 2015-07-23
2015-03-03 NaT NaT 2015-11-14
2015-06-05 2015-06-19 2015-06-20 2015-10-24
2015-10-08 2015-10-21 2015-10-22 2015-12-22
我正在尝试创建第五列,其中包含 action_date
列与前三个日期列 1st_date, 2nd_date, 3rd_date
的比较结果(或组)。
我正在尝试使用将每个日期分配给一个组的字符串填充名为 action_group
的第五列。
潜在函数(和预期输出)的伪代码如下:if action_date > 1st_date and < 2nd_date then action_group = '1st_action_group'
action_date
、2nd_date
和 3rd_date
需要进行相同的比较,这将导致 action_group
列中的输出 2nd_action_group
。
最后,如果 action_date
大于 3rd_date
,action_group
将被赋值为 3rd_action_group
。
预期输出的示例如下所示。
1st_date 2nd_date 3rd_date action_date action_group
2015-10-05 NaT NaT 2015-12-03 1st_action_group
2015-02-27 2015-03-14 2015-03-15 2015-04-08 3rd_action_group
2015-03-07 2015-03-27 2015-03-28 2015-03-27 2nd_action_group
2015-01-05 2015-01-20 2015-01-21 2015-05-20 3rd_action_group
2015-01-05 2015-01-20 2015-01-21 2015-09-16 3rd_action_group
2015-05-23 2015-06-18 2015-06-19 2015-07-01 3rd_action_group
2015-03-03 NaT NaT 2015-07-23 1st_action_group
2015-03-03 NaT NaT 2015-11-14 1st_action_group
2015-06-05 2015-06-19 2015-06-20 2015-10-24 3rd_action_group
2015-10-08 2015-10-21 2015-10-22 2015-12-22 3rd_action_group
如能提供任何帮助,我们将不胜感激。
df['action_group'] = np.where(df['action_date']>df['3rd_date'],
'3rd_action_group',
np.where(((df['action_date'] >= df['2nd_date'])&(df['action_date']<df['3rd_date'])),
'2nd_action_group',
'1st_action_group'))
您只需堆叠 2 个 np.where 即可获得您想要的结果。
1st_date 2nd_date 3rd_date action_date action_group
0 2015-10-05 NaT NaT 2015-12-03 1st_action_group
1 2015-02-27 2015-03-14 2015-03-15 2015-04-08 3rd_action_group
2 2015-03-07 2015-03-27 2015-03-28 2015-03-27 2nd_action_group
3 2015-01-05 2015-01-20 2015-01-21 2015-05-20 3rd_action_group
4 2015-01-05 2015-01-20 2015-01-21 2015-09-16 3rd_action_group
5 2015-05-23 2015-06-18 2015-06-19 2015-07-01 3rd_action_group
6 2015-03-03 NaT NaT 2015-07-23 1st_action_group
7 2015-03-03 NaT NaT 2015-11-14 1st_action_group
8 2015-06-05 2015-06-19 2015-06-20 2015-10-24 3rd_action_group
9 2015-10-08 2015-10-21 2015-10-22 2015-12-22 3rd_action_group
我正在尝试创建一个基于列,并根据一个日期列与其他三个日期列的比较,用一个值填充它。
DataFrame df
的示例如下所示。所有显示的日期都已转换为 pd.to_datetime
,这导致了一些 NaT
值,因为个人尚未进步
1st_date 2nd_date 3rd_date action_date
2015-10-05 NaT NaT 2015-12-03
2015-02-27 2015-03-14 2015-03-15 2015-04-08
2015-03-07 2015-03-27 2015-03-28 2015-03-27
2015-01-05 2015-01-20 2015-01-21 2015-05-20
2015-01-05 2015-01-20 2015-01-21 2015-09-16
2015-05-23 2015-06-18 2015-06-19 2015-07-01
2015-03-03 NaT NaT 2015-07-23
2015-03-03 NaT NaT 2015-11-14
2015-06-05 2015-06-19 2015-06-20 2015-10-24
2015-10-08 2015-10-21 2015-10-22 2015-12-22
我正在尝试创建第五列,其中包含 action_date
列与前三个日期列 1st_date, 2nd_date, 3rd_date
的比较结果(或组)。
我正在尝试使用将每个日期分配给一个组的字符串填充名为 action_group
的第五列。
潜在函数(和预期输出)的伪代码如下:if action_date > 1st_date and < 2nd_date then action_group = '1st_action_group'
action_date
、2nd_date
和 3rd_date
需要进行相同的比较,这将导致 action_group
列中的输出 2nd_action_group
。
最后,如果 action_date
大于 3rd_date
,action_group
将被赋值为 3rd_action_group
。
预期输出的示例如下所示。
1st_date 2nd_date 3rd_date action_date action_group
2015-10-05 NaT NaT 2015-12-03 1st_action_group
2015-02-27 2015-03-14 2015-03-15 2015-04-08 3rd_action_group
2015-03-07 2015-03-27 2015-03-28 2015-03-27 2nd_action_group
2015-01-05 2015-01-20 2015-01-21 2015-05-20 3rd_action_group
2015-01-05 2015-01-20 2015-01-21 2015-09-16 3rd_action_group
2015-05-23 2015-06-18 2015-06-19 2015-07-01 3rd_action_group
2015-03-03 NaT NaT 2015-07-23 1st_action_group
2015-03-03 NaT NaT 2015-11-14 1st_action_group
2015-06-05 2015-06-19 2015-06-20 2015-10-24 3rd_action_group
2015-10-08 2015-10-21 2015-10-22 2015-12-22 3rd_action_group
如能提供任何帮助,我们将不胜感激。
df['action_group'] = np.where(df['action_date']>df['3rd_date'],
'3rd_action_group',
np.where(((df['action_date'] >= df['2nd_date'])&(df['action_date']<df['3rd_date'])),
'2nd_action_group',
'1st_action_group'))
您只需堆叠 2 个 np.where 即可获得您想要的结果。
1st_date 2nd_date 3rd_date action_date action_group
0 2015-10-05 NaT NaT 2015-12-03 1st_action_group
1 2015-02-27 2015-03-14 2015-03-15 2015-04-08 3rd_action_group
2 2015-03-07 2015-03-27 2015-03-28 2015-03-27 2nd_action_group
3 2015-01-05 2015-01-20 2015-01-21 2015-05-20 3rd_action_group
4 2015-01-05 2015-01-20 2015-01-21 2015-09-16 3rd_action_group
5 2015-05-23 2015-06-18 2015-06-19 2015-07-01 3rd_action_group
6 2015-03-03 NaT NaT 2015-07-23 1st_action_group
7 2015-03-03 NaT NaT 2015-11-14 1st_action_group
8 2015-06-05 2015-06-19 2015-06-20 2015-10-24 3rd_action_group
9 2015-10-08 2015-10-21 2015-10-22 2015-12-22 3rd_action_group