如何从一个数据框中的不同时间序列中获取趋势值?
How to get trend values from different time series in one data frame?
这是示例数据:
data<-read.table(textConnection('customer_ID transaction_num sales
Josh 1
Josh 2
Josh 3
Ray 1
Ray 2
Ray 3
Eric 1
Eric 2
Eric 3 '),header=TRUE,stringsAsFactors=FALSE)
data$sales<-as.numeric(sub('\$','',data$sales))
很清楚如何获取其中一个客户id的趋势值:
dataTransformed<-dcast(data, transaction_num ~ customer_ID, value.var="sales", fun.aggregate=sum)
transaction_num Eric Josh Ray
1 10 35 65
2 13 50 52
3 9 65 49
fitted(lm(dataTransformed$Eric ~ dataTransformed$transaction_num))
1 2 3
11.16667 10.66667 10.16667
但我想为每个客户 ID 获取一个包含 'trend values' 列的数据框,而不是 'sales' 列或仅在其附近。要得到这样的东西:
customer_ID transaction_num trend
Josh 1 35
Josh 2 50
Josh 3 65
Ray 1 63.3
Ray 2 10.6
Ray 3 10.2
Eric 1 11.2
Eric 2 10.7
Eric 3 10.2
任何帮助都将不胜感激。谢谢
您可以为每个customer_ID
申请lm
。
library(dplyr)
data %>%
group_by(customer_ID) %>%
mutate(trend = fitted(lm(sales ~ transaction_num)))
# customer_ID transaction_num sales trend
# <chr> <int> <dbl> <dbl>
#1 Josh 1 35 35
#2 Josh 2 50 50.
#3 Josh 3 65 65
#4 Ray 1 65 63.3
#5 Ray 2 52 55.3
#6 Ray 3 49 47.3
#7 Eric 1 10 11.2
#8 Eric 2 13 10.7
#9 Eric 3 9 10.2
你可以简单地用一个交互词来做lm
:
data$trend <- fitted(lm(sales ~ customer_ID * transaction_num, data))
data
#> customer_ID transaction_num sales trend
#> 1 Josh 1 35 35.00000
#> 2 Josh 2 50 50.00000
#> 3 Josh 3 65 65.00000
#> 4 Ray 1 65 63.33333
#> 5 Ray 2 52 55.33333
#> 6 Ray 3 49 47.33333
#> 7 Eric 1 10 11.16667
#> 8 Eric 2 13 10.66667
#> 9 Eric 3 9 10.16667
这是示例数据:
data<-read.table(textConnection('customer_ID transaction_num sales
Josh 1
Josh 2
Josh 3
Ray 1
Ray 2
Ray 3
Eric 1
Eric 2
Eric 3 '),header=TRUE,stringsAsFactors=FALSE)
data$sales<-as.numeric(sub('\$','',data$sales))
很清楚如何获取其中一个客户id的趋势值:
dataTransformed<-dcast(data, transaction_num ~ customer_ID, value.var="sales", fun.aggregate=sum)
transaction_num Eric Josh Ray
1 10 35 65
2 13 50 52
3 9 65 49
fitted(lm(dataTransformed$Eric ~ dataTransformed$transaction_num))
1 2 3
11.16667 10.66667 10.16667
但我想为每个客户 ID 获取一个包含 'trend values' 列的数据框,而不是 'sales' 列或仅在其附近。要得到这样的东西:
customer_ID transaction_num trend
Josh 1 35
Josh 2 50
Josh 3 65
Ray 1 63.3
Ray 2 10.6
Ray 3 10.2
Eric 1 11.2
Eric 2 10.7
Eric 3 10.2
任何帮助都将不胜感激。谢谢
您可以为每个customer_ID
申请lm
。
library(dplyr)
data %>%
group_by(customer_ID) %>%
mutate(trend = fitted(lm(sales ~ transaction_num)))
# customer_ID transaction_num sales trend
# <chr> <int> <dbl> <dbl>
#1 Josh 1 35 35
#2 Josh 2 50 50.
#3 Josh 3 65 65
#4 Ray 1 65 63.3
#5 Ray 2 52 55.3
#6 Ray 3 49 47.3
#7 Eric 1 10 11.2
#8 Eric 2 13 10.7
#9 Eric 3 9 10.2
你可以简单地用一个交互词来做lm
:
data$trend <- fitted(lm(sales ~ customer_ID * transaction_num, data))
data
#> customer_ID transaction_num sales trend
#> 1 Josh 1 35 35.00000
#> 2 Josh 2 50 50.00000
#> 3 Josh 3 65 65.00000
#> 4 Ray 1 65 63.33333
#> 5 Ray 2 52 55.33333
#> 6 Ray 3 49 47.33333
#> 7 Eric 1 10 11.16667
#> 8 Eric 2 13 10.66667
#> 9 Eric 3 9 10.16667