按上一行和实际行定义值

Question

我有一个包含两个字段的 data.table，startvalue 和 endValue，我需要根据前一行和实际行的一些信息来填充它们。虽然这在某种程度上类似于 this, and ，但我无法获得我想要的结果。

虚拟数据：

a <- data.table(user = c("A", "A", "A", "B", "B"), 
                gap = c(1, 0, 2, 2, 3), 
                priority = c(1, 3, 2, 2, 1))

然后我为所有优先级修复 startValue == 1:

setkey(a, user, priority)
a[priority == 1, startValue := 0]

并且我为那些已经定义了 startValue 的对象设置了 endValue：

a[!is.na(startValue), endValue := startValue + gap*3]

问题来了。我希望第 2 行（用户 A，优先级 2）中的 startValue 与第 1 行的 endValue 相同，因此我可以计算新的 endValue。我知道我可以使用循环，但我想知道是否可以使用任何其他函数或函数组合来实现。

我尝试了 shift 和 zoo:na.locf 的几种组合，但最终总是弄乱已经存在的值。

预期结果：

b <- structure(list(user = c("A", "A", "A", "B", "B"), 
                    gap = c(1, 2, 0, 3, 2), 
                    priority = c(1, 2, 3, 1, 2), 
                    startValue = c(0, 3, 9, 0, 9), 
                    endValue = c(3, 9, 9, 9, 15)), 
               row.names = c(NA, -5L), 
               class = c("data.table", "data.frame"))

Answer 1

我们可以使用 purrr

中的 accumulate

library(purrr)
library(data.table)
a[, endValue := accumulate(gap,  ~   .x + .y * 3, .init = 0)[-1], user
   ][, startValue := shift(endValue, fill = 0), user][]
all.equal(a, b, check.attributes = FALSE)
#[1] TRUE

或使用 base R 中的 Reduce 创建 'endValue' 列，然后使用 'endValue' 的 lag 创建 'startValue' 按 'user'

分组

a[, endValue := Reduce(function(x, y) x + y *3, gap, 
     accumulate = TRUE, init = 0)[-1], user]

Answer 2

首先，使用cumsum计算最终值。然后使用 shift 获取起始值。

a[ , c("startValue", "endValue") := {
  e1 <- startValue[1] + gap[1] * 3
  endValue <- c(e1, e1 + cumsum(gap[-1] * 3))
  startValue <- shift(endValue, fill = startValue[1])
  .(startValue, endValue)
}, by = user]

#    user gap priority startValue endValue
# 1:    A   1        1          0        3
# 2:    A   2        2          3        9
# 3:    A   0        3          9        9
# 4:    B   3        1          0        9
# 5:    B   2        2          9       15

按上一行和实际行定义值

Define value by previous and actual row

r

data.table

locf