计算 R 中一列的保留率
Calculate Retention Rate on one column in R
我需要你的建议,因为我正在努力寻找 R 中的正确命令。
基本上我想计算特定客户的保留率。 customer_math 是客户活跃时间的快照,包括 8 年的时间范围。
customer customer_math
Apple 1
Tesco 10
Nespresso 1001
Dell 11
BMW 11111100
最终数据集应如下所示:
customer customer_math retention_rate
Apple 1 1
Tesco 10 0.5
Nespresso 1001 0.5
Dell 11 1
BMW 11111100 0.75
有什么办法可以解决我的问题吗?
非常感谢您的帮助!谢谢!
library(tidyverse)
tribble(
~customer, ~customer_math,
"Apple", 1,
"Tesco", 10,
"Nespresso", 1001,
"Dell", 11,
"BMW", 11111100
) %>%
mutate(active_count = str_count(customer_math, "1"),
periods = str_length(customer_math),
retention_rate = active_count / periods)
## A tibble: 5 x 5
# customer customer_math active_count periods retention_rate
# <chr> <dbl> <int> <int> <dbl>
#1 Apple 1 1 1 1
#2 Tesco 10 1 2 0.5
#3 Nespresso 1001 2 4 0.5
#4 Dell 11 2 2 1
#5 BMW 11111100 6 8 0.75
您可以删除字符串中所有的 0,计算 nchar
并除以总数 nchar
。
df$retention_rate <- with(df, nchar(gsub('0', '', customer_math, fixed = TRUE))/
nchar(customer_math))
df
# customer customer_math retention_rate
#1 Apple 1 1.00
#2 Tesco 10 0.50
#3 Nespresso 1001 0.50
#4 Dell 11 1.00
#5 BMW 11111100 0.75
数据
df <- structure(list(customer = structure(c(1L, 5L, 4L, 3L, 2L),
.Label = c("Apple", "BMW", "Dell", "Nespresso", "Tesco"), class = "factor"),
customer_math = c(1L, 10L, 1001L, 11L, 11111100L)), class = "data.frame",
row.names = c(NA, -5L))
实现预期结果的另一个 Base R 解决方案:
# Coerce customer_math vector to a character type to enable
# the string split, loop through each element:
df$retention_rate <- sapply(as.character(df$customer_math),
function(x){
# Split each element up into a vector comrpised of
# each of the characters:
elements_split <- unlist(strsplit(x, ""))
# Divide the sum of each of these vectors by their length:
rr <- sum(as.numeric(elements_split))/length(elements_split)
# Explicitly return the above vector:
return(rr)
}
)
数据:
df <- structure(
list(
customer = structure(
c(1L, 5L, 4L, 3L, 2L),
.Label = c("Apple", "BMW", "Dell", "Nespresso", "Tesco"),
class = "factor"
),
customer_math = c(1L, 10L, 1001L, 11L, 11111100L)
),
class = "data.frame",
row.names = c(NA,-5L)
)
我需要你的建议,因为我正在努力寻找 R 中的正确命令。
基本上我想计算特定客户的保留率。 customer_math 是客户活跃时间的快照,包括 8 年的时间范围。
customer customer_math
Apple 1
Tesco 10
Nespresso 1001
Dell 11
BMW 11111100
最终数据集应如下所示:
customer customer_math retention_rate
Apple 1 1
Tesco 10 0.5
Nespresso 1001 0.5
Dell 11 1
BMW 11111100 0.75
有什么办法可以解决我的问题吗?
非常感谢您的帮助!谢谢!
library(tidyverse)
tribble(
~customer, ~customer_math,
"Apple", 1,
"Tesco", 10,
"Nespresso", 1001,
"Dell", 11,
"BMW", 11111100
) %>%
mutate(active_count = str_count(customer_math, "1"),
periods = str_length(customer_math),
retention_rate = active_count / periods)
## A tibble: 5 x 5
# customer customer_math active_count periods retention_rate
# <chr> <dbl> <int> <int> <dbl>
#1 Apple 1 1 1 1
#2 Tesco 10 1 2 0.5
#3 Nespresso 1001 2 4 0.5
#4 Dell 11 2 2 1
#5 BMW 11111100 6 8 0.75
您可以删除字符串中所有的 0,计算 nchar
并除以总数 nchar
。
df$retention_rate <- with(df, nchar(gsub('0', '', customer_math, fixed = TRUE))/
nchar(customer_math))
df
# customer customer_math retention_rate
#1 Apple 1 1.00
#2 Tesco 10 0.50
#3 Nespresso 1001 0.50
#4 Dell 11 1.00
#5 BMW 11111100 0.75
数据
df <- structure(list(customer = structure(c(1L, 5L, 4L, 3L, 2L),
.Label = c("Apple", "BMW", "Dell", "Nespresso", "Tesco"), class = "factor"),
customer_math = c(1L, 10L, 1001L, 11L, 11111100L)), class = "data.frame",
row.names = c(NA, -5L))
实现预期结果的另一个 Base R 解决方案:
# Coerce customer_math vector to a character type to enable
# the string split, loop through each element:
df$retention_rate <- sapply(as.character(df$customer_math),
function(x){
# Split each element up into a vector comrpised of
# each of the characters:
elements_split <- unlist(strsplit(x, ""))
# Divide the sum of each of these vectors by their length:
rr <- sum(as.numeric(elements_split))/length(elements_split)
# Explicitly return the above vector:
return(rr)
}
)
数据:
df <- structure(
list(
customer = structure(
c(1L, 5L, 4L, 3L, 2L),
.Label = c("Apple", "BMW", "Dell", "Nespresso", "Tesco"),
class = "factor"
),
customer_math = c(1L, 10L, 1001L, 11L, 11111100L)
),
class = "data.frame",
row.names = c(NA,-5L)
)