R 中滚动 window 组的多元线性回归
Multiple linear regression by group in a rolling window in R
我的数据框如下所示:
Date = c(rep(as.Date(seq(15000,15012)),2))
Group = c(rep("a",13),rep("b",13))
y = c(seq(1,26,1))
x1 = c(seq(0.01,0.26,0.01))
x2 = c(seq(0.02,0.26*2,0.02))
df = data.frame(Group,Date,y,x1,x2)
head(df,3)
Group
Date
y
x1
x2
a
2011-01-26
1
0.01
0.02
a
2011-01-27
2
0.02
0.04
a
2011-01-28
3
0.03
0.06
我想在滚动 window 即 3.
中按组(y 作为因变量,x1、x2 作为自变量)进行多元回归
我尝试使用包 tidyverse
和 zoo
以及以下代码来实现此目的,但失败了。
## define multi-var-linear regression function and get the residual
rsd <- function(df){
lm(formula = y~x1+x2, data = as.data.frame(df), na.action = na.omit) %>%
resid() %>%
return()
}
## apply it by group with rolling window
x <- df %>% group_by(Group) %>%
rollapplyr(. , width = 3, FUN = rsd)
这段代码的输出不是我真正想要的。
有谁知道如何在滚动中按组进行多元回归 window?
提前致谢,吉赛尔
感谢 Grothendieck 和 Marcus 的代码!
它真的帮了我很多:)
我现在将它们添加到这里:
# Grothendieck method
rsd <- function(df){
lm(formula = y~x1+x2, data = as.data.frame(df), na.action = na.omit) %>%
resid() %>%
return()
}
width <- 5
df_m2 <-
df %>%
group_by(Group) %>%
group_modify(~ {
cbind(., rollapplyr(.[c("y", "x1", "x2")], width, rsd, fill = NA,
by.column = FALSE))
}) %>%
ungroup %>%
select(c("Group","Date","5")) %>%
dplyr::rename(residual_m2 = "5")
# Marcus method
output <- data.frame()
for (i in unique(df$Group)) {
a = df%>% subset(Group==i)
a[,"residual"] = NA
max = nrow(a)
if(max<5){
next
}
for (j in seq(5,max,by=1)) {
b = a %>% slice((j-4):j)
lm_ = lm(y~x1+x2, data = b)
a[j,]$residual = residuals(lm_)[5]
}
output <-
output %>%
rbind(a)
}
一个好的 old-fashioned for-loop 这里可以是:
for (i in unique(df$Group)){
for (j in (seq(15000,15012, 3))){
lm_ <- lm(formula = df[df$Group== i & df$Date %in% c(j, j+1, j+2), 3] ~ df[df$Group== i & df$Date %in% c(j, j+1, j+2), 4] + df[df$Group== i & df$Date %in% c(j, j+1, j+2), 5], na.action = na.omit)
print(paste('Group', i, 'Dates from', j, 'to', j+3, residuals(lm_)))
}
}
使用 group_modify 并使用带有 by.column = FALSE 参数的 rollapplyr 以便 rsd 一次应用于所有列,而不是一次应用于一个列。
请注意,如果您将宽度 3 与两个预测变量和一个截距一起使用,则残差必然全为零,因此我们将宽度更改为 5。
library(dplyr, exclude = c("lag", "filter"))
library(zoo)
width <- 5
df %>%
group_by(Group) %>%
group_modify(~ {
cbind(., rollapplyr(.[c("y", "x1", "x2")], width, rsd, fill = NA,
by.column = FALSE))
}) %>%
ungroup
我的数据框如下所示:
Date = c(rep(as.Date(seq(15000,15012)),2))
Group = c(rep("a",13),rep("b",13))
y = c(seq(1,26,1))
x1 = c(seq(0.01,0.26,0.01))
x2 = c(seq(0.02,0.26*2,0.02))
df = data.frame(Group,Date,y,x1,x2)
head(df,3)
Group | Date | y | x1 | x2 |
---|---|---|---|---|
a | 2011-01-26 | 1 | 0.01 | 0.02 |
a | 2011-01-27 | 2 | 0.02 | 0.04 |
a | 2011-01-28 | 3 | 0.03 | 0.06 |
我想在滚动 window 即 3.
中按组(y 作为因变量,x1、x2 作为自变量)进行多元回归我尝试使用包 tidyverse
和 zoo
以及以下代码来实现此目的,但失败了。
## define multi-var-linear regression function and get the residual
rsd <- function(df){
lm(formula = y~x1+x2, data = as.data.frame(df), na.action = na.omit) %>%
resid() %>%
return()
}
## apply it by group with rolling window
x <- df %>% group_by(Group) %>%
rollapplyr(. , width = 3, FUN = rsd)
这段代码的输出不是我真正想要的。
有谁知道如何在滚动中按组进行多元回归 window? 提前致谢,吉赛尔
感谢 Grothendieck 和 Marcus 的代码! 它真的帮了我很多:) 我现在将它们添加到这里:
# Grothendieck method
rsd <- function(df){
lm(formula = y~x1+x2, data = as.data.frame(df), na.action = na.omit) %>%
resid() %>%
return()
}
width <- 5
df_m2 <-
df %>%
group_by(Group) %>%
group_modify(~ {
cbind(., rollapplyr(.[c("y", "x1", "x2")], width, rsd, fill = NA,
by.column = FALSE))
}) %>%
ungroup %>%
select(c("Group","Date","5")) %>%
dplyr::rename(residual_m2 = "5")
# Marcus method
output <- data.frame()
for (i in unique(df$Group)) {
a = df%>% subset(Group==i)
a[,"residual"] = NA
max = nrow(a)
if(max<5){
next
}
for (j in seq(5,max,by=1)) {
b = a %>% slice((j-4):j)
lm_ = lm(y~x1+x2, data = b)
a[j,]$residual = residuals(lm_)[5]
}
output <-
output %>%
rbind(a)
}
一个好的 old-fashioned for-loop 这里可以是:
for (i in unique(df$Group)){
for (j in (seq(15000,15012, 3))){
lm_ <- lm(formula = df[df$Group== i & df$Date %in% c(j, j+1, j+2), 3] ~ df[df$Group== i & df$Date %in% c(j, j+1, j+2), 4] + df[df$Group== i & df$Date %in% c(j, j+1, j+2), 5], na.action = na.omit)
print(paste('Group', i, 'Dates from', j, 'to', j+3, residuals(lm_)))
}
}
使用 group_modify 并使用带有 by.column = FALSE 参数的 rollapplyr 以便 rsd 一次应用于所有列,而不是一次应用于一个列。
请注意,如果您将宽度 3 与两个预测变量和一个截距一起使用,则残差必然全为零,因此我们将宽度更改为 5。
library(dplyr, exclude = c("lag", "filter"))
library(zoo)
width <- 5
df %>%
group_by(Group) %>%
group_modify(~ {
cbind(., rollapplyr(.[c("y", "x1", "x2")], width, rsd, fill = NA,
by.column = FALSE))
}) %>%
ungroup