如何滞后 R 中数据框的多个特定列

Question

我想在 R 中延迟数据框的多个特定列。

让我们以这个通用示例为例。假设我已经定义了我需要滞后的数据框的哪些列：

Lag <- c(0, 1, 0, 1)
Lag.Index <- is.element(Lag, 1)
df <- data.frame(x1 = 1:8, x2 = 1:8, x3 = 1:8, x4 = 1:8)

我的初始数据框：

        x1  x2  x3  x4   
    1   1   1   1   1
    2   2   2   2   2
    3   3   3   3   3
    4   4   4   4   4 
    5   5   5   5   5
    6   6   6   6   6
    7   7   7   7   7
    8   8   8   8   8

我想计算以下数据框：

        x1  x2  x3  x4   
    1   1   NA  1   NA
    2   2   2   2   2
    3   3   3   3   3
    4   4   4   4   4 
    5   5   5   5   5
    6   6   6   6   6
    7   7   7   7   7
    8   8   8   8   8

我知道如何只对一个滞后列执行此操作，如图所示，但无法找到一种以优雅的方式对多个滞后列执行此操作的方法。非常感谢任何帮助。

Answer 1

您可以使用 purrr 的 map2_dfc 按列滞后不同的值。

purrr::map2_dfc(df, Lag, dplyr::lag)

#     x1    x2    x3    x4
#  <int> <int> <int> <int>
#1     1    NA     1    NA
#2     2     1     2     1
#3     3     2     3     2
#4     4     3     4     3
#5     5     4     5     4
#6     6     5     6     5
#7     7     6     7     6
#8     8     7     8     7

或 data.table :

library(data.table)
setDT(df)[, names(df) := Map(shift, .SD, Lag)]

Answer 2

不确定这是否足够优雅，但我会使用 dplyr 的 mutate_at 函数来调整列

df %>% dplyr::mutate_at(.vars = vars(x2,x4),.funs = ~lag(., default = NA))

Answer 3

我们将lag转换为logicalclass，得到对应的names并使用across从dplyr

library(dplyr)
df %>% 
      mutate(across(names(.)[as.logical(Lag)], lag))
#  x1 x2 x3 x4
#1  1 NA  1 NA
#2  2  1  2  1
#3  3  2  3  2
#4  4  3  4  3
#5  5  4  5  4
#6  6  5  6  5
#7  7  6  7  6
#8  8  7  8  7

或者我们可以在 base R

df[as.logical(Lag)] <- rbind(NA, df[-nrow(df), as.logical(Lag)])

Answer 4

使用 shift 和 Vectorize

的 data.table 选项

> setDT(df)[, Vectorize(shift)(.SD, Lag)]
     x1 x2 x3 x4
[1,]  1 NA  1 NA
[2,]  2  1  2  1
[3,]  3  2  3  2
[4,]  4  3  4  3
[5,]  5  4  5  4
[6,]  6  5  6  5
[7,]  7  6  7  6
[8,]  8  7  8  7

如何滞后 R 中数据框的多个特定列

How to lag multiple specific columns of a data frame in R

r

lag

dataframe