将天数与值进行比较

compare number of days with value

我使用下面名为“emails_visits”的数据集

      Territory Account: External ID                Date  Clicked  Opened  Sent       Call Method date_after date_before days_before_visit  days_after_visit
40582     PsAPS4         WNLN03239383 2021-02-16 13:46:00      0.0     0.0   1.0               RTE        NaT         NaT               NaT               NaT
19726  CardioPS5         WNLN00441144 2021-09-17 13:33:00      0.0     0.0   1.0               RTE        NaT         NaT               NaT               NaT
3532       ASPS4         WNLN00026136 2021-10-25 17:02:00      0.0     0.0   1.0               RTE 2021-10-21         NaT               NaT   4 days 17:02:00
22371  CardioPS6         WNLN04438596 2021-06-15 13:44:00      0.0     1.0   1.0               RTE        NaT         NaT               NaT               NaT
35930     PSOPS5         WNLN02913837 2021-08-19 09:59:00      0.0     1.0   1.0               RTE        NaT         NaT               NaT               NaT
40099     PsAPS3         WNLN09365001 2021-02-25 16:18:00      0.0     0.0   1.0               RTE 2020-05-12         NaT               NaT 289 days 16:18:00
25013  CardioPS7         WNLN04585438 2021-05-31 14:45:00      0.0     1.0   1.0               RTE        NaT  2021-06-22  21 days 09:15:00               NaT
60381   MEDRESP6         WNLN00000715 2021-03-02 00:00:00      NaN     NaN   NaN  Virtual MS Teams 2021-03-02  2021-03-02   0 days 00:00:00   0 days 00:00:00

我想创建带有时间括号的新列,例如,如果值 <3 天时间括号是 [3] 我用了 emails_visits["before_bracket"]=emails_visits.apply(lambda x:"[3]"if x[10]<3 else "[10]" if x[10]<10 days else "[10+]")

我收到一条错误消息TypeError: '<' not supported between instances of 'str' and 'int'

我还尝试使用 emails_visits["days_before_visit"]=pd.to_numeric(emails_visits["days_before_visit"]) 将列转换为数字 但是得到了一些奇怪的数字,例如 -9223372036854775808 或 1433520000000000

假设days_after_visit列的数据类型为timedelta64[ns], 您可以使用 dt.days 访问器来提取天数:

emails_visits["before_bracket"] = emails_visits['days_after_visit'].dt.days.apply(
    lambda x: "[3]" if x<3 else "[10]" if x<10 else "[10+]")

它给出:

       Territory Account: External ID  ...  days_after_visit  before_bracket
40582     PsAPS4         WNLN03239383  ...               NaT           [10+]
19726  CardioPS5         WNLN00441144  ...               NaT           [10+]
3532       ASPS4         WNLN00026136  ...   4 days 17:02:00            [10]
22371  CardioPS6         WNLN04438596  ...               NaT           [10+]
35930     PSOPS5         WNLN02913837  ...               NaT           [10+]
40099     PsAPS3         WNLN09365001  ... 289 days 16:18:00           [10+]
25013  CardioPS7         WNLN04585438  ...               NaT           [10+]
60381   MEDRESP6         WNLN00000715  ...   0 days 00:00:00             [3]

如果该列仅包含字符串,您可以将其转换为时间增量:

emails_visits['days_after_visit'] = pd.to_timedelta(emails_visits['days_after_visit'])

一种更简洁、更快速的方法是将 np.select 与条件和值一起使用

import numpy as np


days_after_visit = emails_visits["days_after_visit"].dt.days

conditions = [
    days_after_visit < 3,
    days_after_visit < 10,
    days_after_visit >= 10
]

values = [
    "[3]",
    "[10]",
    "[10+]"
]

emails_visits["before_bracket"] = np.select(conditions, values)