计算 R 中一列的保留率

Calculate Retention Rate on one column in R

我需要你的建议,因为我正在努力寻找 R 中的正确命令。

基本上我想计算特定客户的保留率。 customer_math 是客户活跃时间的快照,包括 8 年的时间范围。

customer  customer_math
Apple          1
Tesco          10
Nespresso      1001
Dell           11
BMW            11111100

最终数据集应如下所示:

customer  customer_math      retention_rate
Apple          1                1
Tesco          10               0.5
Nespresso      1001             0.5
Dell           11               1
BMW            11111100         0.75

有什么办法可以解决我的问题吗?

非常感谢您的帮助!谢谢!

library(tidyverse)
tribble(
    ~customer, ~customer_math,
      "Apple",              1,
      "Tesco",             10,
  "Nespresso",           1001,
       "Dell",             11,
        "BMW",       11111100
  ) %>%
  mutate(active_count = str_count(customer_math, "1"),
         periods = str_length(customer_math),
         retention_rate = active_count / periods)

## A tibble: 5 x 5
#  customer  customer_math active_count periods retention_rate
#  <chr>             <dbl>        <int>   <int>          <dbl>
#1 Apple                 1            1       1           1   
#2 Tesco                10            1       2           0.5 
#3 Nespresso          1001            2       4           0.5 
#4 Dell                 11            2       2           1   
#5 BMW            11111100            6       8           0.75

您可以删除字符串中所有的 0,计算 nchar 并除以总数 nchar

df$retention_rate <- with(df, nchar(gsub('0', '', customer_math, fixed = TRUE))/
                              nchar(customer_math))
df
#   customer customer_math retention_rate
#1     Apple             1           1.00
#2     Tesco            10           0.50
#3 Nespresso          1001           0.50
#4      Dell            11           1.00
#5       BMW      11111100           0.75

数据

df <- structure(list(customer = structure(c(1L, 5L, 4L, 3L, 2L), 
.Label = c("Apple", "BMW", "Dell", "Nespresso", "Tesco"), class = "factor"), 
customer_math = c(1L, 10L, 1001L, 11L, 11111100L)), class = "data.frame", 
row.names = c(NA, -5L))

实现预期结果的另一个 Base R 解决方案:

# Coerce customer_math vector to a character type to enable 
# the string split, loop through each element: 

    df$retention_rate <- sapply(as.character(df$customer_math), 

           function(x){

             # Split each element up into a vector comrpised of
             # each of the characters: 

             elements_split <- unlist(strsplit(x, ""))

             # Divide the sum of each of these vectors by their length: 

             rr <- sum(as.numeric(elements_split))/length(elements_split)

             # Explicitly return the above vector: 

             return(rr)
      }
    )

数据:

df <- structure(
  list(
    customer = structure(
      c(1L, 5L, 4L, 3L, 2L),
      .Label = c("Apple", "BMW", "Dell", "Nespresso", "Tesco"),
      class = "factor"
    ),
    customer_math = c(1L, 10L, 1001L, 11L, 11111100L)
  ),
  class = "data.frame",
  row.names = c(NA,-5L)
)