迭代填充数据表列

Question

使用初始值，我想根据存储在单独列中的增长率，通过 id 在 data.table 列中迭代填充 NA。

以下data.table为例：

library(data.table)
DT <- data.table(id = c("A","A","A","A","B","B","B","B"),  date=1:4,
 growth=1L+runif(8), index= c(NA,250,NA,NA,NA,300,NA,NA))

> DT
   id date   growth index
1:  A    1 1.654628    NA
2:  A    2 1.770219   250
3:  A    3 1.255893    NA
4:  A    4 1.185985    NA
5:  B    1 1.826187    NA
6:  B    2 1.055251   300
7:  B    3 1.180389    NA
8:  B    4 1.204108    NA

基本上，我需要日期 2 之后的索引值的 id：

index_{i,t} = growth_{i,t}*index_{i,t-1}

并且，对于日期 2 之前的值：

index_{i,t} = index_{i,t-1}/growth_{i,t-1}

我试过使用 shift，但这只替换了 t+1 处的索引：

DT[, index := growth * shift(index,1L, type="lag")]

更新想要的结果看起来像这样

> DT
   id date   growth    index
1:  A    1 1.440548 141.2255
2:  A    2 1.395092 250.0000
3:  A    3 1.793094 313.9733
4:  A    4 1.784224 372.3676
5:  B    1 1.129264 284.2926
6:  B    2 1.978359 300.0000
7:  B    3 1.228979 354.1167
8:  B    4 1.453433 426.3948

Answer 1

首先，我们将定义一个接受两个向量 values 和 growths 的函数，即

在 values

NA

通过将它与非NA之间的所有growths相乘来确定values中每个元素与非NA的比率。
那个乘法吗

请注意，这不会捕获存在多个非 NA 值的情况，如果 values 只有 NA 值，它将出错。但我把 exception-handling 留给你，因为你最清楚该怎么做。

apply_growth <- function(values, growths) {
  given <- which(!is.na(values))[1]

  cumulative_growth <- vapply(
    X = seq_along(growths),
    FUN.VALUE = numeric(1),
    FUN = function(x) {
      if (x < given) {
        1 / prod(growths[seq(x + 1, given)])
      } else if (x > given) {
        prod(growths[seq(given + 1, x)])
      } else if (x == given) {
        1
      }
    }
  )

  values[given] * cumulative_growth
}

现在我们将其应用于 DT 的每个子组。只是为了确定，我们将指定行必须按 date.

排序

DT[
  order(date),
  index := apply_growth(index, growth),
  by = id
]

DT
#    id date   growth    index
# 1:  A    1 1.993863 180.7514
# 2:  A    2 1.383115 250.0000
# 3:  A    3 1.350102 337.5256
# 4:  A    4 1.863802 629.0809
# 5:  B    1 1.664999 249.2398
# 6:  B    2 1.203660 300.0000
# 7:  B    3 1.595310 478.5931
# 8:  B    4 1.002311 479.6989

迭代填充数据表列

fill datatable column iteratively

r

data-management

panel-data

data.table