按组计算“all”或“any”的累计和
Cumulative sum with `all` or `any` by group
考虑向量
group = rep(1:6, each = 2)
x = 1:12
现在,如果组中的任何成员满足条件,我想按组计算累计和。例如,条件是 x %% 3 == 0
.
## Without the cumulative sum
ave(x, group, FUN = function(x) any(x %% 3 == 0))
# [1] 0 0 1 1 1 1 0 0 1 1 1 1
## With the cumulative sum
ave(x, group, FUN = function(x) cumsum(any(x %% 3 == 0)))
# [1] 0 0 1 1 1 1 0 0 1 1 1 1
##Expected result with cumsum:
# [1] 0 0 1 2 1 2 0 0 1 2 1 2
这也出现在dplyr
:
dWithoutCumsum <- data.frame(group, x) %>%
group_by(group) %>%
mutate(z = +any(x %% 3 == 0))
dWithCumsum <- data.frame(group, x) %>%
group_by(group) %>%
mutate(z = cumsum(any(x %% 3 == 0)))
all.equal(dWithCumsum,dWithoutCumsum)
# [1] TRUE
另外,后面设置cumsum
函数时,一切正常:
ave(ave(x, group, FUN = function(x) any(x %% 3 == 0)), group, FUN = cumsum)
# [1] 0 0 1 2 1 2 0 0 1 2 1 2
data.frame(group, x) %>%
group_by(group) %>%
mutate(z = any(x %% 3 == 0),
z = cumsum(z)) %>%
pull(z)
# [1] 0 0 1 2 1 2 0 0 1 2 1 2
为什么 cumsum
函数在这些情况下无法按预期工作(也不适用于 all
而不是 any
),是吗?一行就能得到预期的结果?
我的理解是,如果您至少检测到 3 的倍数,则您希望 return 递增序列,否则为零向量。在那种情况下:
g <- gl(6, 2)
g
## [1] 1 1 2 2 3 3 4 4 5 5 6 6
## Levels: 1 2 3 4 5 6
x <- seq_along(g)
x
## [1] 1 2 3 4 5 6 7 8 9 10 11 12
f <- function(x) if (any(x %% 3 == 0)) seq_along(x) else integer(length(x))
unsplit(tapply(x, g, f, simplify = FALSE), g)
## [1] 0 0 1 2 1 2 0 0 1 2 1 2
或者,在一个数据框中,dplyr
:
library("dplyr")
d <- data.frame(g, x)
d %>% group_by(g) %>% mutate(y = f(x))
# A tibble: 12 × 3
# Groups: g [6]
g x y
<fct> <int> <int>
1 1 1 0
2 1 2 0
3 2 3 1
4 2 4 2
5 3 5 1
6 3 6 2
7 4 7 0
8 4 8 0
9 5 9 1
10 5 10 2
11 6 11 1
12 6 12 2
您实际上并没有在做 cumsum
-- 不需要求和。您正在查找组内的行号。
这里有一些使用 dplyr
的方法:
df %>%
group_by(group) %>%
mutate(
result1 = row_number() * any(y %% 3 == 0),
result2 = case_when(
any(y %% 3 == 0) ~ row_number(),
TRUE ~ 0L
)
)
# # A tibble: 12 × 4
# # Groups: group [6]
# group y result1 result2
# <int> <int> <int> <int>
# 1 1 1 0 0
# 2 1 2 0 0
# 3 2 3 1 1
# 4 2 4 2 2
# 5 3 5 1 1
# 6 3 6 2 2
# 7 4 7 0 0
# 8 4 8 0 0
# 9 5 9 1 1
# 10 5 10 2 2
# 11 6 11 1 1
# 12 6 12 2 2
考虑向量
group = rep(1:6, each = 2)
x = 1:12
现在,如果组中的任何成员满足条件,我想按组计算累计和。例如,条件是 x %% 3 == 0
.
## Without the cumulative sum
ave(x, group, FUN = function(x) any(x %% 3 == 0))
# [1] 0 0 1 1 1 1 0 0 1 1 1 1
## With the cumulative sum
ave(x, group, FUN = function(x) cumsum(any(x %% 3 == 0)))
# [1] 0 0 1 1 1 1 0 0 1 1 1 1
##Expected result with cumsum:
# [1] 0 0 1 2 1 2 0 0 1 2 1 2
这也出现在dplyr
:
dWithoutCumsum <- data.frame(group, x) %>%
group_by(group) %>%
mutate(z = +any(x %% 3 == 0))
dWithCumsum <- data.frame(group, x) %>%
group_by(group) %>%
mutate(z = cumsum(any(x %% 3 == 0)))
all.equal(dWithCumsum,dWithoutCumsum)
# [1] TRUE
另外,后面设置cumsum
函数时,一切正常:
ave(ave(x, group, FUN = function(x) any(x %% 3 == 0)), group, FUN = cumsum)
# [1] 0 0 1 2 1 2 0 0 1 2 1 2
data.frame(group, x) %>%
group_by(group) %>%
mutate(z = any(x %% 3 == 0),
z = cumsum(z)) %>%
pull(z)
# [1] 0 0 1 2 1 2 0 0 1 2 1 2
为什么 cumsum
函数在这些情况下无法按预期工作(也不适用于 all
而不是 any
),是吗?一行就能得到预期的结果?
我的理解是,如果您至少检测到 3 的倍数,则您希望 return 递增序列,否则为零向量。在那种情况下:
g <- gl(6, 2)
g
## [1] 1 1 2 2 3 3 4 4 5 5 6 6
## Levels: 1 2 3 4 5 6
x <- seq_along(g)
x
## [1] 1 2 3 4 5 6 7 8 9 10 11 12
f <- function(x) if (any(x %% 3 == 0)) seq_along(x) else integer(length(x))
unsplit(tapply(x, g, f, simplify = FALSE), g)
## [1] 0 0 1 2 1 2 0 0 1 2 1 2
或者,在一个数据框中,dplyr
:
library("dplyr")
d <- data.frame(g, x)
d %>% group_by(g) %>% mutate(y = f(x))
# A tibble: 12 × 3
# Groups: g [6]
g x y
<fct> <int> <int>
1 1 1 0
2 1 2 0
3 2 3 1
4 2 4 2
5 3 5 1
6 3 6 2
7 4 7 0
8 4 8 0
9 5 9 1
10 5 10 2
11 6 11 1
12 6 12 2
您实际上并没有在做 cumsum
-- 不需要求和。您正在查找组内的行号。
这里有一些使用 dplyr
的方法:
df %>%
group_by(group) %>%
mutate(
result1 = row_number() * any(y %% 3 == 0),
result2 = case_when(
any(y %% 3 == 0) ~ row_number(),
TRUE ~ 0L
)
)
# # A tibble: 12 × 4
# # Groups: group [6]
# group y result1 result2
# <int> <int> <int> <int>
# 1 1 1 0 0
# 2 1 2 0 0
# 3 2 3 1 1
# 4 2 4 2 2
# 5 3 5 1 1
# 6 3 6 2 2
# 7 4 7 0 0
# 8 4 8 0 0
# 9 5 9 1 1
# 10 5 10 2 2
# 11 6 11 1 1
# 12 6 12 2 2