如何在 r 中使列动态化

Question

以上代码计算群组保留率。队列是加入的月份。因此，该代码计算了 2015 年 5 月加入的客户数量，以及每月活跃的客户数量。最终输出存储在数据框 df1 中（如下所示）

我需要帮助创建动态列名，这些名称目前在 ddply 函数中是硬编码的。 M0 表示加入月份，M1 表示加入后第 1 个月，M2 表示加入后 2 个月，M(n) 应该是变量。这可以通过从最早的加入日期减去最远的到期日期来计算。

不幸的是，我无法动态地自动计算 M0 到 M(n) 的范围。

这是我的代码转储，它可以工作但不是最佳的，因为我已将 M0 到 M3 硬编码为 ddply 函数中的变量。因此，如果我的输入数据有一个订阅期超过 5 个月的客户，我的代码将失败。

代码的输入是以下虚拟数据。

customer    dj       exp
abc      01/05/15   25/06/15
efg      01/05/15   25/07/15
ghd      01/05/15   25/07/15
mkd      01/06/15   25/07/15
kskm     01/06/15   05/08/15

可重现的代码。

    library(zoo)
    library(plyr)

    customer<-c("abc","efg","ghd","mkd","kskm")
    dj<-c("2015-05-01", "2015-05-01", "2015-05-01","2015-06-01","2015-06-01")
    exp<-c("2015-06-25", "2015-07-25", "2015-07-25","2015-07-01","2015-08-05")
    data.frame(customer,dj,exp)
    df$dj <- as.Date(df$dj,"%d/%m/%y")
    df$exp <- as.Date(df$exp,"%d/%m/%y")

    # The data in the file has different variable names than your example data
    # so I'm changing them to match
    names(df)[1:3] <- c("customer","dj","exp")

    # Make a variable called Cohort that contains only the year and month of joining
    # as.yearmon() comes from the 'zoo' package
    df$Cohort <- as.yearmon(df$dj)

    # Calculate the difference in months between date of expiry and date of joining
    df$MonthDiff <- ceiling((df$exp-df$dj)/30)
    #df$MonthDiff <- 12*(as.yearmon(df$exp+months(1))-df$Cohort)

    range<-as.integer(ceiling((max(df$exp)-min(df$dj)))/30)

    # Use ddply() from the 'plyr' package to get the frequency of subjects that are
    # still active after 0, 1, 2, and 3 months.

    df1 <- ddply(df,.(Cohort),summarize,
                 M0 = sum(MonthDiff > 0), 
                 M1 = sum(MonthDiff > 1),
                 M2 = sum(MonthDiff > 2),
                 M3 = sum(MonthDiff > 3)

    )

 df1


df1
    Cohort M0 M1 M2 M3 
1 May 2015  3  3  2  0  
2 Jun 2015  2  2  1  0

以上是输出工作输出。要求是使列 M0 到 M3 动态

Answer 1

尝试在创建后插入这个 range:

for(i in 0:range) df <- within(df,assign(paste0("M",i),MonthDiff>i))

df1 <- ddply(df,.(Cohort),function(x) colSums(x[,paste0("M",0:range)]))

df1
#     Cohort M0 M1 M2 M3
# 1 May 2015  3  3  2  0
# 2 Jun 2015  2  1  1  0

如何在 r 中使列动态化

How to make column dynamic in r

variables

r

calculated-columns