R根据索引列值将函数应用于数据

Question

示例：

require(data.table)
example = matrix(c(rnorm(15, 5, 1), rep(1:3, each=5)), ncol = 2, nrow = 15)
example = data.table(example)
setnames(example, old=c("V1","V2"), new=c("target", "index"))
example


threshold = 100

accumulating_cost = function(x,y) { x-cumsum(y) }
whats_left = accumulating_cost(threshold, example$target)
whats_left

我希望 whats_left 包含 threshold 与 example$target 中值的累积和之间的差值，其中 example$index = 1、2 和 3 . 所以我使用了下面的 for 循环：

rm(whats_left)

whats_left = vector("list")
for(i in 1:max(example$index)) {
  whats_left[[i]] = accumulating_cost(threshold, example$target[example$index==i])
}

whats_left = unlist(whats_left)
whats_left

plot(whats_left~c(1:15))

我知道 for 循环不是 R 中的魔鬼，但我习惯于在可能的情况下使用矢量化（包括远离 apply，作为 for 循环包装器）。我很确定这是可能的，但我不知道该怎么做。任何帮助将不胜感激。

Answer 1

您要做的就是通过 index 累积成本 。因此，您可能希望使用 by 参数，如

example[, accumulating_cost(threshold, target), by = index]

R根据索引列值将函数应用于数据

R apply function to data based on index column value

r

vectorization

data.table