提取过去 21 天的事件类型 window
Extracting event types from last 21 day window
我的数据框看起来像这样。最右边的两列是我想要的列。
**Name ActivityType ActivityDate Email(last 21 says) Webinar(last21)**
John Email 1/1/2014 NA NA
John Webinar 1/5/2014 NA NA
John Sale 1/20/2014 Yes Yes
John Webinar 3/25/2014 NA NA
John Sale 4/1/2014 No Yes
John Sale 7/1/2014 No No
Tom Email 1/1/2015 NA NA
Tom Webinar 1/5/2015 NA NA
Tom Sale 1/20/2015 Yes Yes
Tom Webinar 3/25/2015 NA NA
Tom Sale 4/1/2015 No Yes
Tom Sale 7/1/2015 No No
我只是想创建一个 yes/no 变量来表示在过去 21 天中每个 "Sale" 交易是否有电子邮件或网络研讨会。我在想(模拟代码)以这种方式使用 dplyr:
custlife %>%
group_by(Name) %>%
mutate(Email(last21days)=lag(ifelse(ActivityType = "Email" & ActivityDate of email within (activity date of sale - 21),Yes,No)).
我不确定实现这个的方法。请帮忙。衷心感谢您的帮助!
这是一个可能的 data.table
解决方案。在这里,我创建了 2 个临时数据集 - 一个用于 Sale
,一个用于其余 activity 类型,然后在使用 [=16= 时通过滚动 window 21 在它们之间加入] 为了检查每个连接的条件。然后,我将结果加入原始数据集。
将日期列转换为 Date
class 并按名称和日期键入数据(对于 final/rolling 联接)
library(data.table)
setkey(setDT(df)[, ActivityDate := as.IDate(ActivityDate, "%m/%d/%Y")], Name, ActivityDate)
每个创建 2 个临时数据集activity
Saletemp <- df[ActivityType == "Sale", .(Name, ActivityDate)]
Elsetemp <- df[ActivityType != "Sale", .(Name, ActivityDate, ActivityType)]
通过滚动 window 21 加入销售临时数据集,同时检查条件
Saletemp[Elsetemp, `:=`(Email21 = as.logical(which(i.ActivityType == "Email")),
Webinar21 = as.logical(which(i.ActivityType == "Webinar"))),
roll = -21, by = .EACHI]
把一切都加入回来
df[Saletemp, `:=`(Email21 = i.Email21, Webinar21 = i.Webinar21)]
df
# Name ActivityType ActivityDate Email21 Webinar21
# 1: John Email 2014-01-01 NA NA
# 2: John Webinar 2014-01-05 NA NA
# 3: John Sale 2014-01-20 TRUE TRUE
# 4: John Webinar 2014-03-25 NA NA
# 5: John Sale 2014-04-01 NA TRUE
# 6: John Sale 2014-07-01 NA NA
# 7: Tom Email 2015-01-01 NA NA
# 8: Tom Webinar 2015-01-05 NA NA
# 9: Tom Sale 2015-01-20 TRUE TRUE
# 10: Tom Webinar 2015-03-25 NA NA
# 11: Tom Sale 2015-04-01 NA TRUE
# 12: Tom Sale 2015-07-01 NA NA
这是另一个选项base R
:
df
先按照Name
进行拆分,然后在每个子集中,对于每个Sale
,看是否在21天内有Email(Webinar)销售。最后根据Name
.
拆分列表
之后您只需将 FALSE
替换为 no
并将 TRUE
替换为 yes
。
df_split <- split(df, df$Name)
df_split <- lapply(df_split, function(tab){
i_s <- which(tab[,2]=="Sale")
tab$Email21[i_s] <- sapply(tab[i_s, 3], function(d_s){any(tab[tab$ActivityType=="Email", 3] >= d_s-21)})
tab$Webinar21[i_s] <- sapply(tab[i_s, 3], function(d_s){any(tab[tab$ActivityType=="Webinar", 3] >= d_s-21)})
tab
})
df_res <- unsplit(df_split, df$Name)
df_res
# Name ActivityType ActivityDate Email21 Webinar21
#1 John Email 2014-01-01 NA NA
#2 John Webinar 2014-01-05 NA NA
#3 John Sale 2014-01-20 TRUE TRUE
#4 John Webinar 2014-03-25 NA NA
#5 John Sale 2014-04-01 FALSE TRUE
#6 John Sale 2014-07-01 FALSE FALSE
#7 Tom Email 2015-01-01 NA NA
#8 Tom Webinar 2015-01-05 NA NA
#9 Tom Sale 2015-01-20 TRUE TRUE
#10 Tom Webinar 2015-03-25 NA NA
#11 Tom Sale 2015-04-01 FALSE TRUE
#12 Tom Sale 2015-07-01 FALSE FALSE
数据
df <- structure(list(Name = c("John", "John", "John", "John", "John",
"John", "Tom", "Tom", "Tom", "Tom", "Tom", "Tom"), ActivityType = c("Email",
"Webinar", "Sale", "Webinar", "Sale", "Sale", "Email", "Webinar",
"Sale", "Webinar", "Sale", "Sale"), ActivityDate = structure(c(16071,
16075, 16090, 16154, 16161, 16252, 16436, 16440, 16455, 16519,
16526, 16617), class = "Date")), .Names = c("Name", "ActivityType",
"ActivityDate"), row.names = c(NA, -12L), index = structure(integer(0), ActivityType = c(1L,
7L, 3L, 5L, 6L, 9L, 11L, 12L, 2L, 4L, 8L, 10L)), class = "data.frame")
我的数据框看起来像这样。最右边的两列是我想要的列。
**Name ActivityType ActivityDate Email(last 21 says) Webinar(last21)**
John Email 1/1/2014 NA NA
John Webinar 1/5/2014 NA NA
John Sale 1/20/2014 Yes Yes
John Webinar 3/25/2014 NA NA
John Sale 4/1/2014 No Yes
John Sale 7/1/2014 No No
Tom Email 1/1/2015 NA NA
Tom Webinar 1/5/2015 NA NA
Tom Sale 1/20/2015 Yes Yes
Tom Webinar 3/25/2015 NA NA
Tom Sale 4/1/2015 No Yes
Tom Sale 7/1/2015 No No
我只是想创建一个 yes/no 变量来表示在过去 21 天中每个 "Sale" 交易是否有电子邮件或网络研讨会。我在想(模拟代码)以这种方式使用 dplyr:
custlife %>%
group_by(Name) %>%
mutate(Email(last21days)=lag(ifelse(ActivityType = "Email" & ActivityDate of email within (activity date of sale - 21),Yes,No)).
我不确定实现这个的方法。请帮忙。衷心感谢您的帮助!
这是一个可能的 data.table
解决方案。在这里,我创建了 2 个临时数据集 - 一个用于 Sale
,一个用于其余 activity 类型,然后在使用 [=16= 时通过滚动 window 21 在它们之间加入] 为了检查每个连接的条件。然后,我将结果加入原始数据集。
将日期列转换为 Date
class 并按名称和日期键入数据(对于 final/rolling 联接)
library(data.table)
setkey(setDT(df)[, ActivityDate := as.IDate(ActivityDate, "%m/%d/%Y")], Name, ActivityDate)
每个创建 2 个临时数据集activity
Saletemp <- df[ActivityType == "Sale", .(Name, ActivityDate)]
Elsetemp <- df[ActivityType != "Sale", .(Name, ActivityDate, ActivityType)]
通过滚动 window 21 加入销售临时数据集,同时检查条件
Saletemp[Elsetemp, `:=`(Email21 = as.logical(which(i.ActivityType == "Email")),
Webinar21 = as.logical(which(i.ActivityType == "Webinar"))),
roll = -21, by = .EACHI]
把一切都加入回来
df[Saletemp, `:=`(Email21 = i.Email21, Webinar21 = i.Webinar21)]
df
# Name ActivityType ActivityDate Email21 Webinar21
# 1: John Email 2014-01-01 NA NA
# 2: John Webinar 2014-01-05 NA NA
# 3: John Sale 2014-01-20 TRUE TRUE
# 4: John Webinar 2014-03-25 NA NA
# 5: John Sale 2014-04-01 NA TRUE
# 6: John Sale 2014-07-01 NA NA
# 7: Tom Email 2015-01-01 NA NA
# 8: Tom Webinar 2015-01-05 NA NA
# 9: Tom Sale 2015-01-20 TRUE TRUE
# 10: Tom Webinar 2015-03-25 NA NA
# 11: Tom Sale 2015-04-01 NA TRUE
# 12: Tom Sale 2015-07-01 NA NA
这是另一个选项base R
:
df
先按照Name
进行拆分,然后在每个子集中,对于每个Sale
,看是否在21天内有Email(Webinar)销售。最后根据Name
.
拆分列表
之后您只需将 FALSE
替换为 no
并将 TRUE
替换为 yes
。
df_split <- split(df, df$Name)
df_split <- lapply(df_split, function(tab){
i_s <- which(tab[,2]=="Sale")
tab$Email21[i_s] <- sapply(tab[i_s, 3], function(d_s){any(tab[tab$ActivityType=="Email", 3] >= d_s-21)})
tab$Webinar21[i_s] <- sapply(tab[i_s, 3], function(d_s){any(tab[tab$ActivityType=="Webinar", 3] >= d_s-21)})
tab
})
df_res <- unsplit(df_split, df$Name)
df_res
# Name ActivityType ActivityDate Email21 Webinar21
#1 John Email 2014-01-01 NA NA
#2 John Webinar 2014-01-05 NA NA
#3 John Sale 2014-01-20 TRUE TRUE
#4 John Webinar 2014-03-25 NA NA
#5 John Sale 2014-04-01 FALSE TRUE
#6 John Sale 2014-07-01 FALSE FALSE
#7 Tom Email 2015-01-01 NA NA
#8 Tom Webinar 2015-01-05 NA NA
#9 Tom Sale 2015-01-20 TRUE TRUE
#10 Tom Webinar 2015-03-25 NA NA
#11 Tom Sale 2015-04-01 FALSE TRUE
#12 Tom Sale 2015-07-01 FALSE FALSE
数据
df <- structure(list(Name = c("John", "John", "John", "John", "John",
"John", "Tom", "Tom", "Tom", "Tom", "Tom", "Tom"), ActivityType = c("Email",
"Webinar", "Sale", "Webinar", "Sale", "Sale", "Email", "Webinar",
"Sale", "Webinar", "Sale", "Sale"), ActivityDate = structure(c(16071,
16075, 16090, 16154, 16161, 16252, 16436, 16440, 16455, 16519,
16526, 16617), class = "Date")), .Names = c("Name", "ActivityType",
"ActivityDate"), row.names = c(NA, -12L), index = structure(integer(0), ActivityType = c(1L,
7L, 3L, 5L, 6L, 9L, 11L, 12L, 2L, 4L, 8L, 10L)), class = "data.frame")