在 dplyr 中突变多个 cumsum
Mutate multiple cumsum in dplyr
我正在尝试用 mutate
开发一个 cumsum。挑战在于我有 10 个栏目要做,而且我知道如何一一完成。有什么方法可以让我做类似 mutate(across(all_of(c(3:4)), ~cumsum(c(3:4)))
的事情吗?
cat %>%
group_by(animals) %>%
mutate(weight1 = cumsum(weight1),
weight2 = cumsum(weight2))
structure(list(animals = c("E1", "E1", "E1",
"E2", "E2", "E2"), period = structure(c(18690,
18697, 18704, 18690, 18697, 18704), class = "Date"), weight1 = c(704,
734, 653, 851, 911, 829), weight2 = c(0, 235, 325, 0, 148,
200)), row.names = c(NA, -6L), class = c("data.table", "data.frame"))
预期输出:
animals period weight1 weight2
<chr> <date> <dbl> <dbl>
1 E1 2021-03-04 704 0
2 E1 2021-03-11 1438 235
3 E1 2021-03-18 2091 560
4 E2 2021-03-04 851 0
5 E2 2021-03-11 1762 148
6 E2 2021-03-18 2591 348
尝试这样做
df <- structure(list(animals = c("E1", "E1", "E1",
"E2", "E2", "E2"), period = structure(c(18690,
18697, 18704, 18690, 18697, 18704), class = "Date"), weight1 = c(704,
734, 653, 851, 911, 829), weight2 = c(0, 235, 325, 0, 148,
200)), row.names = c(NA, -6L), class = c("data.table", "data.frame"))
library(dplyr)
df %>%
group_by(animals) %>%
mutate(across(starts_with("weight"), cumsum))
#> # A tibble: 6 x 4
#> # Groups: animals [2]
#> animals period weight1 weight2
#> <chr> <date> <dbl> <dbl>
#> 1 E1 2021-03-04 704 0
#> 2 E1 2021-03-11 1438 235
#> 3 E1 2021-03-18 2091 560
#> 4 E2 2021-03-04 851 0
#> 5 E2 2021-03-11 1762 148
#> 6 E2 2021-03-18 2591 348
由 reprex package (v1.0.0)
于 2021 年 3 月 24 日创建
或
vars <- names(df)[3:4]
df %>% group_by(animals) %>% mutate(across(all_of(vars), cumsum))
您尝试执行的操作会出错。一旦你 group_by(animals)
,mutate
只能操作三列。所以你可以使用:
cat %>%
group_by(animals) %>%
mutate(across(2:3, cumsum))
# A tibble: 6 x 4
# Groups: animals [2]
animals period weight1 weight2
<chr> <date> <dbl> <dbl>
1 E1 2021-03-04 704 0
2 E1 2021-03-11 1438 235
3 E1 2021-03-18 2091 560
4 E2 2021-03-04 851 0
5 E2 2021-03-11 1762 148
6 E2 2021-03-18 2591 348
但是这种方法要求您知道新索引是什么。最好以编程方式尝试一些东西。如果所有的列都是权重,你可以使用:
cat %>%
group_by(animals) %>%
mutate(across(starts_with("weight"), cumsum))
或者如果您只想对所有数字列进行操作:
cat %>%
group_by(animals) %>%
mutate(across(where(is.numeric), cumsum))
后两种方法都能提供您想要的输出。
基础 R 解决方案:
num_col_idx <- vapply(df, is.numeric, logical(1))
cbind(df[,!num_col_idx],
data.frame(do.call(rbind, lapply(
split(df[, num_col_idx], df$animals), cumsum)), row.names = NULL))
我正在尝试用 mutate
开发一个 cumsum。挑战在于我有 10 个栏目要做,而且我知道如何一一完成。有什么方法可以让我做类似 mutate(across(all_of(c(3:4)), ~cumsum(c(3:4)))
的事情吗?
cat %>%
group_by(animals) %>%
mutate(weight1 = cumsum(weight1),
weight2 = cumsum(weight2))
structure(list(animals = c("E1", "E1", "E1",
"E2", "E2", "E2"), period = structure(c(18690,
18697, 18704, 18690, 18697, 18704), class = "Date"), weight1 = c(704,
734, 653, 851, 911, 829), weight2 = c(0, 235, 325, 0, 148,
200)), row.names = c(NA, -6L), class = c("data.table", "data.frame"))
预期输出:
animals period weight1 weight2
<chr> <date> <dbl> <dbl>
1 E1 2021-03-04 704 0
2 E1 2021-03-11 1438 235
3 E1 2021-03-18 2091 560
4 E2 2021-03-04 851 0
5 E2 2021-03-11 1762 148
6 E2 2021-03-18 2591 348
尝试这样做
df <- structure(list(animals = c("E1", "E1", "E1",
"E2", "E2", "E2"), period = structure(c(18690,
18697, 18704, 18690, 18697, 18704), class = "Date"), weight1 = c(704,
734, 653, 851, 911, 829), weight2 = c(0, 235, 325, 0, 148,
200)), row.names = c(NA, -6L), class = c("data.table", "data.frame"))
library(dplyr)
df %>%
group_by(animals) %>%
mutate(across(starts_with("weight"), cumsum))
#> # A tibble: 6 x 4
#> # Groups: animals [2]
#> animals period weight1 weight2
#> <chr> <date> <dbl> <dbl>
#> 1 E1 2021-03-04 704 0
#> 2 E1 2021-03-11 1438 235
#> 3 E1 2021-03-18 2091 560
#> 4 E2 2021-03-04 851 0
#> 5 E2 2021-03-11 1762 148
#> 6 E2 2021-03-18 2591 348
由 reprex package (v1.0.0)
于 2021 年 3 月 24 日创建或
vars <- names(df)[3:4]
df %>% group_by(animals) %>% mutate(across(all_of(vars), cumsum))
您尝试执行的操作会出错。一旦你 group_by(animals)
,mutate
只能操作三列。所以你可以使用:
cat %>%
group_by(animals) %>%
mutate(across(2:3, cumsum))
# A tibble: 6 x 4
# Groups: animals [2]
animals period weight1 weight2
<chr> <date> <dbl> <dbl>
1 E1 2021-03-04 704 0
2 E1 2021-03-11 1438 235
3 E1 2021-03-18 2091 560
4 E2 2021-03-04 851 0
5 E2 2021-03-11 1762 148
6 E2 2021-03-18 2591 348
但是这种方法要求您知道新索引是什么。最好以编程方式尝试一些东西。如果所有的列都是权重,你可以使用:
cat %>%
group_by(animals) %>%
mutate(across(starts_with("weight"), cumsum))
或者如果您只想对所有数字列进行操作:
cat %>%
group_by(animals) %>%
mutate(across(where(is.numeric), cumsum))
后两种方法都能提供您想要的输出。
基础 R 解决方案:
num_col_idx <- vapply(df, is.numeric, logical(1))
cbind(df[,!num_col_idx],
data.frame(do.call(rbind, lapply(
split(df[, num_col_idx], df$animals), cumsum)), row.names = NULL))