将多列的累加和相加的函数

a function to add the cumulative sum of multiple columns

想象一个由多个数值向量和几个因子向量组成的数据集(为此目的而编造)

name <- c("tim", "tom", "ben", "mary", "jane")
sex <- c("male","male","male","female","female")
born <- c(1985, 1986, 1985, 1986, 1984)
v4 <- c(5,4,3,2,1)
v5 <- c(10,20,600,20,5)
v6 <- c(1,2,3,4,5)
v7 <- c(0,0,20,4,60)
df <- data.frame(name, sex, born, v4,v5,v6,v7)
df[1:3] <- lapply(df[1:3], as.factor)
df[4:7] <- lapply(df[4:7], as.numeric)

我正在使用此函数计算数值变量的累加和。

colCumsum <- function(x) {    
                  for (i in 1:ncol(x)) 
                  x[,i] <- cumsum(x[,i])
      x
    }
colCumsum(df[4:7])

当我排除数字变量时,它工作正常。但是由于我需要因子变量,原始数值变量和累积总和组合在一个数据帧中,我试图像这样重写函数:

colCumsum2 <- compiler::cmpfun(function(x) { 
                                            for (i in 1:ncol(x))
                                             {if (is.numeric(x[,i]) == F) {next}}
                                              #exlude non-numeric from function
                                              x[,i+ncol(n)-3] <- cumsum(x[,i])
                                              #add cumulative sum as extra column
                                              x 
                                           } )

我的问题是:“新列会在现有列之后留下空洞”。即使有效,数字 3 也是因为我知道数据集中的因素数 (3), 需要推广。

您可以在此处使用 dplyr 中的 across

library(dplyr)

colCumsum <- function(d) {
  mutate(d, across(where(is.numeric), ~cumsum(.x), .names = "cumsum_{col}"))
}

colCumsum(df)
#>   name    sex born v4  v5 v6 v7 cumsum_v4 cumsum_v5 cumsum_v6 cumsum_v7
#> 1  tim   male 1985  5  10  1  0         5        10         1         0
#> 2  tom   male 1986  4  20  2  0         9        30         3         0
#> 3  ben   male 1985  3 600  3 20        12       630         6        20
#> 4 mary female 1986  2  20  4  4        14       650        10        24
#> 5 jane female 1984  1   5  5 60        15       655        15        84

您可以这样重写 colColsum

colCumsum <- function(x) {
  check <- sapply(x, is.numeric)
  x[paste0(names(x)[check], "_cumsum")] <- lapply(x[check], cumsum)
  x
}

此处用于您的示例数据:

colCumsum(df)
#   name    sex born v4  v5 v6 v7 v4_cumsum v5_cumsum v6_cumsum v7_cumsum
# 1  tim   male 1985  5  10  1  0         5        10         1         0
# 2  tom   male 1986  4  20  2  0         9        30         3         0
# 3  ben   male 1985  3 600  3 20        12       630         6        20
# 4 mary female 1986  2  20  4  4        14       650        10        24
# 5 jane female 1984  1   5  5 60        15       655        15        84

作为参考,您可以重写循环以仅关注数字列以使其正常工作:

colCumsum2 <- function(x) { 
  for (i in 1:ncol(x)) {
    if (is.numeric(x[, i])) {
      x[, paste0(names(x)[i], "_cumsum")] <- cumsum(x[, i])
    }
  }
  x
}

matrixStats 包中有一个 colCumsums 函数。只是 cbind 精液。

cbind(df, as.data.frame(matrixStats::colCumsums(as.matrix(df[nums]))))
#   name    sex born v4  v5 v6 v7 V1   V2 V3  V4
# 1  tim   male 1985  5  10  1  0  5   10  1   0
# 2  tom   male 1986  9  30  3  0 14   40  4   0
# 3  ben   male 1985 12 630  6 20 26  670 10  20
# 4 mary female 1986 14 650 10 24 40 1320 20  44
# 5 jane female 1984 15 655 15 84 55 1975 35 128