如何通过找出与往年的重叠来计算获客率?
How to calculate customer acquisition rate by finding out overlapping with previous years?
我设置了一个日期 CustOrder
关于 2008-2013 年客户购买的日期,其中包含以下信息(这只是部分数据):
CustID OrderYear Amount
101102 2008 22429.00
101102 2009 11045.00
101435 2010 10740.77
101435 2011 73669.50
107236 2012 162123.50
101416 2010 8102.00
101416 2011 360.00
101416 2012 36576.00
101416 2013 1960.00
101467 2012 997.00
101604 2010 2971.53
101664 2009 91.94
101664 2011 130.93
.........
有些客户可能每年都连续购买(即101416),或者只是某些年份(即101664)。我想计算出获客率,即当年获得了多少新客户,从比率和数量上看(对于没有持续购买的客户,只考虑第一次购买)。例如,
Year Customer TotalCustomerNumber NewCustomerRate
2008 5 5 0%
2009 3 8 37%
2010 4 12 33%
2011 2 14 14%
2012 3 17 17%
2013 2 19 10%
有人知道ideas/hints怎么做吗?
感谢任何帮助!
我花了一些时间想出了一个解决方案,这个方法应该可行。看看评论了解详情:
# Setting a seed for reproducibility.
set.seed(10)
# Setting what years we want allowed.
validYears <- 2008:2015
# Generating a "fake" dataset for testing purposes.
custDF <- data.frame(CustID = abs(as.integer(rnorm(250, 50, 50))), OrderYear = 0, Amount = abs(rnorm(250, 100, 1000)))
custDF$OrderYear <- sapply(custDF$OrderYear, function(x) x <- sample(validYears, 1)) # Adding random years for each purchase.
# Initializing a new data frame to store the output values.
newDF <- data.frame(Year = validYears, NewCustomers = 0, RunningNewCustomerTotal = 0, NewCustomerRate = "")
custTotal <- 0 # Initializing a variable to be used in the loop.
firstIt <- 1 # Denotes the first iteration.
for (year in validYears) { # For each uniqueYear in your data set (which I arbitarily defined before making the dataset)
# Getting the unique IDs of the current year and the unique IDs of all past years.
currentIDs <- unique(custDF[custDF$OrderYear == year, "CustID"])
pastIDs <- unique(custDF[custDF$OrderYear < year, "CustID"])
if (firstIt == 1) { pastIDs <- c(-1) } # Setting a condition for the first iteration.
newIDs <- currentIDs[!(currentIDs %in% pastIDs)] # Getting all IDs that have not been previously used.
numNewIDs <- length(newIDs) # Getting the number of new IDs.
custTotal <- custTotal + numNewIDs # Getting the running total.
# Adding the new data into the data frame.
newDF[newDF$Year == year, "NewCustomers"] <- numNewIDs
newDF[newDF$Year == year, "RunningNewCustomerTotal"] <- custTotal
# Getting the rate.
if (firstIt == 1) {
NewCustRate <- 0
firstIt <- 2
} else { NewCustRate <- (1 - (newDF[newDF$Year == (year - 1), "RunningNewCustomerTotal"] / custTotal)) * 100 }
# Inputting the new data. Format and round are just getting the decimals down.
newDF[newDF$Year == year, "NewCustomerRate"] <- paste0(format(round(NewCustRate, 2)), "%")
}
输出:
> newDF
Year NewCustomers RunningNewCustomerTotal NewCustomerRate
1 2008 32 32 0%
2 2009 22 54 41%
3 2010 19 73 26%
4 2011 14 87 16%
5 2012 7 94 7.4%
6 2013 3 97 3.1%
7 2014 9 106 8.5%
8 2015 5 111 4.5%
希望对您有所帮助!
我设置了一个日期 CustOrder
关于 2008-2013 年客户购买的日期,其中包含以下信息(这只是部分数据):
CustID OrderYear Amount
101102 2008 22429.00
101102 2009 11045.00
101435 2010 10740.77
101435 2011 73669.50
107236 2012 162123.50
101416 2010 8102.00
101416 2011 360.00
101416 2012 36576.00
101416 2013 1960.00
101467 2012 997.00
101604 2010 2971.53
101664 2009 91.94
101664 2011 130.93
.........
有些客户可能每年都连续购买(即101416),或者只是某些年份(即101664)。我想计算出获客率,即当年获得了多少新客户,从比率和数量上看(对于没有持续购买的客户,只考虑第一次购买)。例如,
Year Customer TotalCustomerNumber NewCustomerRate
2008 5 5 0%
2009 3 8 37%
2010 4 12 33%
2011 2 14 14%
2012 3 17 17%
2013 2 19 10%
有人知道ideas/hints怎么做吗?
感谢任何帮助!
我花了一些时间想出了一个解决方案,这个方法应该可行。看看评论了解详情:
# Setting a seed for reproducibility.
set.seed(10)
# Setting what years we want allowed.
validYears <- 2008:2015
# Generating a "fake" dataset for testing purposes.
custDF <- data.frame(CustID = abs(as.integer(rnorm(250, 50, 50))), OrderYear = 0, Amount = abs(rnorm(250, 100, 1000)))
custDF$OrderYear <- sapply(custDF$OrderYear, function(x) x <- sample(validYears, 1)) # Adding random years for each purchase.
# Initializing a new data frame to store the output values.
newDF <- data.frame(Year = validYears, NewCustomers = 0, RunningNewCustomerTotal = 0, NewCustomerRate = "")
custTotal <- 0 # Initializing a variable to be used in the loop.
firstIt <- 1 # Denotes the first iteration.
for (year in validYears) { # For each uniqueYear in your data set (which I arbitarily defined before making the dataset)
# Getting the unique IDs of the current year and the unique IDs of all past years.
currentIDs <- unique(custDF[custDF$OrderYear == year, "CustID"])
pastIDs <- unique(custDF[custDF$OrderYear < year, "CustID"])
if (firstIt == 1) { pastIDs <- c(-1) } # Setting a condition for the first iteration.
newIDs <- currentIDs[!(currentIDs %in% pastIDs)] # Getting all IDs that have not been previously used.
numNewIDs <- length(newIDs) # Getting the number of new IDs.
custTotal <- custTotal + numNewIDs # Getting the running total.
# Adding the new data into the data frame.
newDF[newDF$Year == year, "NewCustomers"] <- numNewIDs
newDF[newDF$Year == year, "RunningNewCustomerTotal"] <- custTotal
# Getting the rate.
if (firstIt == 1) {
NewCustRate <- 0
firstIt <- 2
} else { NewCustRate <- (1 - (newDF[newDF$Year == (year - 1), "RunningNewCustomerTotal"] / custTotal)) * 100 }
# Inputting the new data. Format and round are just getting the decimals down.
newDF[newDF$Year == year, "NewCustomerRate"] <- paste0(format(round(NewCustRate, 2)), "%")
}
输出:
> newDF
Year NewCustomers RunningNewCustomerTotal NewCustomerRate
1 2008 32 32 0%
2 2009 22 54 41%
3 2010 19 73 26%
4 2011 14 87 16%
5 2012 7 94 7.4%
6 2013 3 97 3.1%
7 2014 9 106 8.5%
8 2015 5 111 4.5%
希望对您有所帮助!