将多列的累加和相加的函数
a function to add the cumulative sum of multiple columns
想象一个由多个数值向量和几个因子向量组成的数据集(为此目的而编造)
name <- c("tim", "tom", "ben", "mary", "jane")
sex <- c("male","male","male","female","female")
born <- c(1985, 1986, 1985, 1986, 1984)
v4 <- c(5,4,3,2,1)
v5 <- c(10,20,600,20,5)
v6 <- c(1,2,3,4,5)
v7 <- c(0,0,20,4,60)
df <- data.frame(name, sex, born, v4,v5,v6,v7)
df[1:3] <- lapply(df[1:3], as.factor)
df[4:7] <- lapply(df[4:7], as.numeric)
我正在使用此函数计算数值变量的累加和。
colCumsum <- function(x) {
for (i in 1:ncol(x))
x[,i] <- cumsum(x[,i])
x
}
colCumsum(df[4:7])
当我排除数字变量时,它工作正常。但是由于我需要因子变量,原始数值变量和累积总和组合在一个数据帧中,我试图像这样重写函数:
colCumsum2 <- compiler::cmpfun(function(x) {
for (i in 1:ncol(x))
{if (is.numeric(x[,i]) == F) {next}}
#exlude non-numeric from function
x[,i+ncol(n)-3] <- cumsum(x[,i])
#add cumulative sum as extra column
x
} )
我的问题是:“新列会在现有列之后留下空洞”。即使有效,数字 3 也是因为我知道数据集中的因素数 (3), 需要推广。
您可以在此处使用 dplyr
中的 across
:
library(dplyr)
colCumsum <- function(d) {
mutate(d, across(where(is.numeric), ~cumsum(.x), .names = "cumsum_{col}"))
}
colCumsum(df)
#> name sex born v4 v5 v6 v7 cumsum_v4 cumsum_v5 cumsum_v6 cumsum_v7
#> 1 tim male 1985 5 10 1 0 5 10 1 0
#> 2 tom male 1986 4 20 2 0 9 30 3 0
#> 3 ben male 1985 3 600 3 20 12 630 6 20
#> 4 mary female 1986 2 20 4 4 14 650 10 24
#> 5 jane female 1984 1 5 5 60 15 655 15 84
您可以这样重写 colColsum
:
colCumsum <- function(x) {
check <- sapply(x, is.numeric)
x[paste0(names(x)[check], "_cumsum")] <- lapply(x[check], cumsum)
x
}
此处用于您的示例数据:
colCumsum(df)
# name sex born v4 v5 v6 v7 v4_cumsum v5_cumsum v6_cumsum v7_cumsum
# 1 tim male 1985 5 10 1 0 5 10 1 0
# 2 tom male 1986 4 20 2 0 9 30 3 0
# 3 ben male 1985 3 600 3 20 12 630 6 20
# 4 mary female 1986 2 20 4 4 14 650 10 24
# 5 jane female 1984 1 5 5 60 15 655 15 84
作为参考,您可以重写循环以仅关注数字列以使其正常工作:
colCumsum2 <- function(x) {
for (i in 1:ncol(x)) {
if (is.numeric(x[, i])) {
x[, paste0(names(x)[i], "_cumsum")] <- cumsum(x[, i])
}
}
x
}
matrixStats
包中有一个 colCumsums
函数。只是 cbind
精液。
cbind(df, as.data.frame(matrixStats::colCumsums(as.matrix(df[nums]))))
# name sex born v4 v5 v6 v7 V1 V2 V3 V4
# 1 tim male 1985 5 10 1 0 5 10 1 0
# 2 tom male 1986 9 30 3 0 14 40 4 0
# 3 ben male 1985 12 630 6 20 26 670 10 20
# 4 mary female 1986 14 650 10 24 40 1320 20 44
# 5 jane female 1984 15 655 15 84 55 1975 35 128
想象一个由多个数值向量和几个因子向量组成的数据集(为此目的而编造)
name <- c("tim", "tom", "ben", "mary", "jane")
sex <- c("male","male","male","female","female")
born <- c(1985, 1986, 1985, 1986, 1984)
v4 <- c(5,4,3,2,1)
v5 <- c(10,20,600,20,5)
v6 <- c(1,2,3,4,5)
v7 <- c(0,0,20,4,60)
df <- data.frame(name, sex, born, v4,v5,v6,v7)
df[1:3] <- lapply(df[1:3], as.factor)
df[4:7] <- lapply(df[4:7], as.numeric)
我正在使用此函数计算数值变量的累加和。
colCumsum <- function(x) {
for (i in 1:ncol(x))
x[,i] <- cumsum(x[,i])
x
}
colCumsum(df[4:7])
当我排除数字变量时,它工作正常。但是由于我需要因子变量,原始数值变量和累积总和组合在一个数据帧中,我试图像这样重写函数:
colCumsum2 <- compiler::cmpfun(function(x) {
for (i in 1:ncol(x))
{if (is.numeric(x[,i]) == F) {next}}
#exlude non-numeric from function
x[,i+ncol(n)-3] <- cumsum(x[,i])
#add cumulative sum as extra column
x
} )
我的问题是:“新列会在现有列之后留下空洞”。即使有效,数字 3 也是因为我知道数据集中的因素数 (3), 需要推广。
您可以在此处使用 dplyr
中的 across
:
library(dplyr)
colCumsum <- function(d) {
mutate(d, across(where(is.numeric), ~cumsum(.x), .names = "cumsum_{col}"))
}
colCumsum(df)
#> name sex born v4 v5 v6 v7 cumsum_v4 cumsum_v5 cumsum_v6 cumsum_v7
#> 1 tim male 1985 5 10 1 0 5 10 1 0
#> 2 tom male 1986 4 20 2 0 9 30 3 0
#> 3 ben male 1985 3 600 3 20 12 630 6 20
#> 4 mary female 1986 2 20 4 4 14 650 10 24
#> 5 jane female 1984 1 5 5 60 15 655 15 84
您可以这样重写 colColsum
:
colCumsum <- function(x) {
check <- sapply(x, is.numeric)
x[paste0(names(x)[check], "_cumsum")] <- lapply(x[check], cumsum)
x
}
此处用于您的示例数据:
colCumsum(df)
# name sex born v4 v5 v6 v7 v4_cumsum v5_cumsum v6_cumsum v7_cumsum
# 1 tim male 1985 5 10 1 0 5 10 1 0
# 2 tom male 1986 4 20 2 0 9 30 3 0
# 3 ben male 1985 3 600 3 20 12 630 6 20
# 4 mary female 1986 2 20 4 4 14 650 10 24
# 5 jane female 1984 1 5 5 60 15 655 15 84
作为参考,您可以重写循环以仅关注数字列以使其正常工作:
colCumsum2 <- function(x) {
for (i in 1:ncol(x)) {
if (is.numeric(x[, i])) {
x[, paste0(names(x)[i], "_cumsum")] <- cumsum(x[, i])
}
}
x
}
matrixStats
包中有一个 colCumsums
函数。只是 cbind
精液。
cbind(df, as.data.frame(matrixStats::colCumsums(as.matrix(df[nums]))))
# name sex born v4 v5 v6 v7 V1 V2 V3 V4
# 1 tim male 1985 5 10 1 0 5 10 1 0
# 2 tom male 1986 9 30 3 0 14 40 4 0
# 3 ben male 1985 12 630 6 20 26 670 10 20
# 4 mary female 1986 14 650 10 24 40 1320 20 44
# 5 jane female 1984 15 655 15 84 55 1975 35 128