计算指定列数的累积和列

Question

下面你可以看到一个更大的例子 table 我有

library(data.table)
input  <- data.table(ID     = c("A", "B"),
                     Para   = c(2.8, 5),
                     Value1 = c(50, 80),
                     Value2 = c(80, 40),
                     Value3 = c(80, 100),
                     Value4 = c(60, 10),
                     Value5 = c(40, 80))

我想要实现的是添加一个列，其中包含在 Para 列中指定的下一个 x 列的累计总和。但如果逗号后有数字，则列中的值应按数字调整。

所以对于第一行 (Para = 2.8)，结果应该是

1*50 + 1*80 + 0.8*80 = 194

第二行 (Para = 5) 的结果应该是

1*80 + 1*40 + 1*100 + 1*10 + 1*80 = 310

最后的 table 应该是这样的

output <- cbind(input, Result = c(194, 310))

我想到的是将 Para 值 2.8 拆分为 5 个数字的百分比向量，因此整个范围。

c(1, 1, .8, 0, 0)

将列 Value1:Value5 与该向量相乘，然后对所有 Value1:Value5 求和。但我不知道如何将 2.8 拆分成这样的向量，也许还有我不知道的更好的解决方案。谢谢。

Answer 1

您可以使用模除法 %/% 和除法的其余部分来创建乘法向量，然后在 apply-调用中使用所有内容：

apply(input, MARGIN = 1, function(x) {
  multiplier <- as.numeric(x["Para"])
  multiplier_long <- c(rep(1, multiplier %/% 1), multiplier %% 1)[1:5]
  multiplier_long[is.na(multiplier_long)] <- 0
  sum(as.numeric(x[-c(1, 2)]) * multiplier_long)
})

# [1] 194 310

Answer 2

如果您想将 Para 拆分为一个 5 值向量，您可以尝试这样的操作：

input %>%
  select(ID,Para) %>%
  slice(rep(1:n(), each = 5)) %>%
  group_by(ID) %>%
  mutate(rn = 1:n()) %>%
  mutate(Para = if_else( (Para - rn)>0, 1.0, 
                        if_else(Para - rn > -1, Para - lag(rn), 0))) %>%
  select(-rn)

给出：

   ID     Para
   <chr> <dbl>
 1 A     1    
 2 A     1    
 3 A     0.800
 4 A     0    
 5 A     0    
 6 B     1    
 7 B     1    
 8 B     1    
 9 B     1    
10 B     1

Answer 3

这是一个解决方案，它以宽格式保存数据并使用 Reduce() 计算 "weighted row sums":

library(data.table)
input[, Cumul := {
  tmp <- c(rep(1, Para), Para %% 1)
  mul <- replace(rep(0, ncol(.SD)), seq_along(tmp), tmp)
  Reduce(sum, .SD * mul)
}, .SDcols = Value1:Value5, by = ID]
input[]

   ID Para Value1 Value2 Value3 Value4 Value5 Cumul
1:  A  2.8     50     80     80     60     40   194
2:  B  5.0     80     40    100     10     80   310

这将适用于 .SDcols 指定的任意数量的列，或者如果 Para 更大。

Answer 4

# example data
input  <- data.frame(ID     = c("A", "B"),
                     Para   = c(2.8, 5),
                     Value1 = c(50, 80),
                     Value2 = c(80, 40),
                     Value3 = c(80, 100),
                     Value4 = c(60, 10),
                     Value5 = c(40, 80))

library(tidyverse)

# function that creates a vector of multipliers based on Para column
# assumes that you have ID, Para and rest columns are Value 1,2...,N
# if Para is larger than the corresponding values it keeps first x multipliers
f_create_vector = function(x) {
    y = if(x %% 1 > 0) c(rep(1, x), x %% 1) else rep(1, x)
    z = rep(0, ncol(input)-2)
    c(y, z[-seq_along(y)])[1:(ncol(input)-2)] 
}


input %>%
  group_by(ID, Para) %>%                              # for each combination
  nest() %>%                                          # nest data
  group_by(ID) %>%                                    # for each ID
  mutate(vec = list(f_create_vector(Para))) %>%       # create a column of multipliers in a list
  mutate(CumSum = map2(data, vec, ~sum(.x * .y))) %>% # get the cumsum using multipliers and the value columns
  ungroup() %>%                                       # forget the grouping
  unnest(data, CumSum) %>%                            # unnest those columns
  select(-vec)                                        # remove that column

# # A tibble: 2 x 8
#   ID     Para CumSum Value1 Value2 Value3 Value4 Value5
#   <fct> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1 A       2.8    194     50     80     80     60     40
# 2 B       5      310     80     40    100     10     80

计算指定列数的累积和列

Calculate a cumulative sum columwise with a specified number of columns

r

cumsum