将天数与值进行比较
compare number of days with value
我使用下面名为“emails_visits”的数据集
Territory Account: External ID Date Clicked Opened Sent Call Method date_after date_before days_before_visit days_after_visit
40582 PsAPS4 WNLN03239383 2021-02-16 13:46:00 0.0 0.0 1.0 RTE NaT NaT NaT NaT
19726 CardioPS5 WNLN00441144 2021-09-17 13:33:00 0.0 0.0 1.0 RTE NaT NaT NaT NaT
3532 ASPS4 WNLN00026136 2021-10-25 17:02:00 0.0 0.0 1.0 RTE 2021-10-21 NaT NaT 4 days 17:02:00
22371 CardioPS6 WNLN04438596 2021-06-15 13:44:00 0.0 1.0 1.0 RTE NaT NaT NaT NaT
35930 PSOPS5 WNLN02913837 2021-08-19 09:59:00 0.0 1.0 1.0 RTE NaT NaT NaT NaT
40099 PsAPS3 WNLN09365001 2021-02-25 16:18:00 0.0 0.0 1.0 RTE 2020-05-12 NaT NaT 289 days 16:18:00
25013 CardioPS7 WNLN04585438 2021-05-31 14:45:00 0.0 1.0 1.0 RTE NaT 2021-06-22 21 days 09:15:00 NaT
60381 MEDRESP6 WNLN00000715 2021-03-02 00:00:00 NaN NaN NaN Virtual MS Teams 2021-03-02 2021-03-02 0 days 00:00:00 0 days 00:00:00
我想创建带有时间括号的新列,例如,如果值 <3 天时间括号是 [3]
我用了
emails_visits["before_bracket"]=emails_visits.apply(lambda x:"[3]"if x[10]<3 else "[10]" if x[10]<10 days else "[10+]")
我收到一条错误消息TypeError: '<' not supported between instances of 'str' and 'int'
我还尝试使用 emails_visits["days_before_visit"]=pd.to_numeric(emails_visits["days_before_visit"])
将列转换为数字
但是得到了一些奇怪的数字,例如 -9223372036854775808 或 1433520000000000
假设days_after_visit
列的数据类型为timedelta64[ns],
您可以使用 dt.days
访问器来提取天数:
emails_visits["before_bracket"] = emails_visits['days_after_visit'].dt.days.apply(
lambda x: "[3]" if x<3 else "[10]" if x<10 else "[10+]")
它给出:
Territory Account: External ID ... days_after_visit before_bracket
40582 PsAPS4 WNLN03239383 ... NaT [10+]
19726 CardioPS5 WNLN00441144 ... NaT [10+]
3532 ASPS4 WNLN00026136 ... 4 days 17:02:00 [10]
22371 CardioPS6 WNLN04438596 ... NaT [10+]
35930 PSOPS5 WNLN02913837 ... NaT [10+]
40099 PsAPS3 WNLN09365001 ... 289 days 16:18:00 [10+]
25013 CardioPS7 WNLN04585438 ... NaT [10+]
60381 MEDRESP6 WNLN00000715 ... 0 days 00:00:00 [3]
如果该列仅包含字符串,您可以将其转换为时间增量:
emails_visits['days_after_visit'] = pd.to_timedelta(emails_visits['days_after_visit'])
一种更简洁、更快速的方法是将 np.select
与条件和值一起使用
import numpy as np
days_after_visit = emails_visits["days_after_visit"].dt.days
conditions = [
days_after_visit < 3,
days_after_visit < 10,
days_after_visit >= 10
]
values = [
"[3]",
"[10]",
"[10+]"
]
emails_visits["before_bracket"] = np.select(conditions, values)
我使用下面名为“emails_visits”的数据集
Territory Account: External ID Date Clicked Opened Sent Call Method date_after date_before days_before_visit days_after_visit
40582 PsAPS4 WNLN03239383 2021-02-16 13:46:00 0.0 0.0 1.0 RTE NaT NaT NaT NaT
19726 CardioPS5 WNLN00441144 2021-09-17 13:33:00 0.0 0.0 1.0 RTE NaT NaT NaT NaT
3532 ASPS4 WNLN00026136 2021-10-25 17:02:00 0.0 0.0 1.0 RTE 2021-10-21 NaT NaT 4 days 17:02:00
22371 CardioPS6 WNLN04438596 2021-06-15 13:44:00 0.0 1.0 1.0 RTE NaT NaT NaT NaT
35930 PSOPS5 WNLN02913837 2021-08-19 09:59:00 0.0 1.0 1.0 RTE NaT NaT NaT NaT
40099 PsAPS3 WNLN09365001 2021-02-25 16:18:00 0.0 0.0 1.0 RTE 2020-05-12 NaT NaT 289 days 16:18:00
25013 CardioPS7 WNLN04585438 2021-05-31 14:45:00 0.0 1.0 1.0 RTE NaT 2021-06-22 21 days 09:15:00 NaT
60381 MEDRESP6 WNLN00000715 2021-03-02 00:00:00 NaN NaN NaN Virtual MS Teams 2021-03-02 2021-03-02 0 days 00:00:00 0 days 00:00:00
我想创建带有时间括号的新列,例如,如果值 <3 天时间括号是 [3]
我用了
emails_visits["before_bracket"]=emails_visits.apply(lambda x:"[3]"if x[10]<3 else "[10]" if x[10]<10 days else "[10+]")
我收到一条错误消息TypeError: '<' not supported between instances of 'str' and 'int'
我还尝试使用 emails_visits["days_before_visit"]=pd.to_numeric(emails_visits["days_before_visit"])
将列转换为数字
但是得到了一些奇怪的数字,例如 -9223372036854775808 或 1433520000000000
假设days_after_visit
列的数据类型为timedelta64[ns],
您可以使用 dt.days
访问器来提取天数:
emails_visits["before_bracket"] = emails_visits['days_after_visit'].dt.days.apply(
lambda x: "[3]" if x<3 else "[10]" if x<10 else "[10+]")
它给出:
Territory Account: External ID ... days_after_visit before_bracket
40582 PsAPS4 WNLN03239383 ... NaT [10+]
19726 CardioPS5 WNLN00441144 ... NaT [10+]
3532 ASPS4 WNLN00026136 ... 4 days 17:02:00 [10]
22371 CardioPS6 WNLN04438596 ... NaT [10+]
35930 PSOPS5 WNLN02913837 ... NaT [10+]
40099 PsAPS3 WNLN09365001 ... 289 days 16:18:00 [10+]
25013 CardioPS7 WNLN04585438 ... NaT [10+]
60381 MEDRESP6 WNLN00000715 ... 0 days 00:00:00 [3]
如果该列仅包含字符串,您可以将其转换为时间增量:
emails_visits['days_after_visit'] = pd.to_timedelta(emails_visits['days_after_visit'])
一种更简洁、更快速的方法是将 np.select
与条件和值一起使用
import numpy as np
days_after_visit = emails_visits["days_after_visit"].dt.days
conditions = [
days_after_visit < 3,
days_after_visit < 10,
days_after_visit >= 10
]
values = [
"[3]",
"[10]",
"[10+]"
]
emails_visits["before_bracket"] = np.select(conditions, values)