根据多个 DateTime 比较创建组

Create groups based on multiple DateTime comparison

我正在尝试创建一个基于列,并根据一个日期列与其他三个日期列的比较,用一个值填充它。

DataFrame df 的示例如下所示。所有显示的日期都已转换为 pd.to_datetime,这导致了一些 NaT 值,因为个人尚未进步

    1st_date     2nd_date        3rd_date     action_date
    2015-10-05   NaT             NaT          2015-12-03 
    2015-02-27   2015-03-14      2015-03-15   2015-04-08 
    2015-03-07   2015-03-27      2015-03-28   2015-03-27 
    2015-01-05   2015-01-20      2015-01-21   2015-05-20 
    2015-01-05   2015-01-20      2015-01-21   2015-09-16 
    2015-05-23   2015-06-18      2015-06-19   2015-07-01 
    2015-03-03   NaT             NaT          2015-07-23 
    2015-03-03   NaT             NaT          2015-11-14 
    2015-06-05   2015-06-19      2015-06-20   2015-10-24 
    2015-10-08   2015-10-21      2015-10-22   2015-12-22 

我正在尝试创建第五列,其中包含 action_date 列与前三个日期列 1st_date, 2nd_date, 3rd_date 的比较结果(或组)。

我正在尝试使用将每个日期分配给一个组的字符串填充名为 action_group 的第五列。

潜在函数(和预期输出)的伪代码如下:if action_date > 1st_date and < 2nd_date then action_group = '1st_action_group'

action_date2nd_date3rd_date 需要进行相同的比较,这将导致 action_group 列中的输出 2nd_action_group

最后,如果 action_date 大于 3rd_dateaction_group 将被赋值为 3rd_action_group

预期输出的示例如下所示。

1st_date     2nd_date        3rd_date     action_date  action_group
2015-10-05   NaT             NaT          2015-12-03   1st_action_group
2015-02-27   2015-03-14      2015-03-15   2015-04-08   3rd_action_group
2015-03-07   2015-03-27      2015-03-28   2015-03-27   2nd_action_group
2015-01-05   2015-01-20      2015-01-21   2015-05-20   3rd_action_group
2015-01-05   2015-01-20      2015-01-21   2015-09-16   3rd_action_group
2015-05-23   2015-06-18      2015-06-19   2015-07-01   3rd_action_group
2015-03-03   NaT             NaT          2015-07-23   1st_action_group
2015-03-03   NaT             NaT          2015-11-14   1st_action_group
2015-06-05   2015-06-19      2015-06-20   2015-10-24   3rd_action_group
2015-10-08   2015-10-21      2015-10-22   2015-12-22   3rd_action_group

如能提供任何帮助,我们将不胜感激。

df['action_group'] = np.where(df['action_date']>df['3rd_date'], 
                              '3rd_action_group', 
                               np.where(((df['action_date'] >= df['2nd_date'])&(df['action_date']<df['3rd_date'])), 
                                          '2nd_action_group', 
                                          '1st_action_group'))

您只需堆叠 2 个 np.where 即可获得您想要的结果。

    1st_date    2nd_date    3rd_date    action_date action_group
0   2015-10-05     NaT          NaT     2015-12-03  1st_action_group
1   2015-02-27  2015-03-14  2015-03-15  2015-04-08  3rd_action_group
2   2015-03-07  2015-03-27  2015-03-28  2015-03-27  2nd_action_group
3   2015-01-05  2015-01-20  2015-01-21  2015-05-20  3rd_action_group
4   2015-01-05  2015-01-20  2015-01-21  2015-09-16  3rd_action_group
5   2015-05-23  2015-06-18  2015-06-19  2015-07-01  3rd_action_group
6   2015-03-03     NaT          NaT     2015-07-23  1st_action_group
7   2015-03-03     NaT          NaT     2015-11-14  1st_action_group
8   2015-06-05  2015-06-19  2015-06-20  2015-10-24  3rd_action_group
9   2015-10-08  2015-10-21  2015-10-22  2015-12-22  3rd_action_group