如何根据列中特定的值序列在 df 中标记组
How to label groups in df based on specific sequence of values in a column
我有一个数据框,其中包含如下所示的 id 和 value 列,但我想根据 value 列中的值按 id 组确定 Status 列。
x <- data.frame(id = c(rep(1,10), rep(2,10), rep(3,10)),
serial = rep(1:10,3),
value = c(rep(1,4), rep(0,3), rep(1,3),
rep(1,4), rep(0,1), rep(-1,2), rep(1,3),
rep(c(1,0),5)),
status = c(rep("Fluctuating", 10),
rep("Fluctuating", 10),
rep("Not fluctuating", 10)))
id serial value status
1 1 1 1 Fluctuating
2 1 2 1 Fluctuating
3 1 3 1 Fluctuating
4 1 4 1 Fluctuating
5 1 5 0 Fluctuating
6 1 6 0 Fluctuating
7 1 7 0 Fluctuating
8 1 8 1 Fluctuating
9 1 9 1 Fluctuating
10 1 10 1 Fluctuating
11 2 1 1 Fluctuating
12 2 2 1 Fluctuating
13 2 3 1 Fluctuating
14 2 4 1 Fluctuating
15 2 5 0 Fluctuating
16 2 6 -1 Fluctuating
17 2 7 -1 Fluctuating
18 2 8 1 Fluctuating
19 2 9 1 Fluctuating
20 2 10 1 Fluctuating
21 3 1 1 Not fluctuating
22 3 2 0 Not fluctuating
23 3 3 1 Not fluctuating
24 3 4 0 Not fluctuating
25 3 5 1 Not fluctuating
26 3 6 0 Not fluctuating
27 3 7 1 Not fluctuating
28 3 8 0 Not fluctuating
29 3 9 1 Not fluctuating
30 3 10 0 Not fluctuating
这里,如果三个或更多的1后面跟着3个或更多的(0或-1),然后再后面跟着3个或更多的1,则认为一个组是波动的。如果三个或更多交替的 0s-1s-0s,-1s-0s-1s 等也将被视为波动
想知道分配状态列的最佳方法是什么,最好使用 dplyr
?
谢谢!
library(dplyr)
# library(zoo) # rollapply
threes <- function(z, minlen = 3L, ptn = c(TRUE, FALSE, TRUE)) {
r <- rle(z > 0)
starts <- zoo::rollapply(r$lengths >= minlen, minlen, all, fill = FALSE, align = "left")
for (st in which(starts)) {
if (all(r$values[st + seq_len(minlen) - 1L] == ptn)) return(TRUE)
}
return(FALSE)
}
x %>%
group_by(id) %>%
mutate(status2 = paste0(if (threes(value)) "" else "Not ", "Fluctuating")) %>%
ungroup() %>%
print(n = 99)
# # A tibble: 30 x 5
# id serial value status status2
# <dbl> <int> <dbl> <chr> <chr>
# 1 1 1 1 Fluctuating Fluctuating
# 2 1 2 1 Fluctuating Fluctuating
# 3 1 3 1 Fluctuating Fluctuating
# 4 1 4 1 Fluctuating Fluctuating
# 5 1 5 0 Fluctuating Fluctuating
# 6 1 6 0 Fluctuating Fluctuating
# 7 1 7 0 Fluctuating Fluctuating
# 8 1 8 1 Fluctuating Fluctuating
# 9 1 9 1 Fluctuating Fluctuating
# 10 1 10 1 Fluctuating Fluctuating
# 11 2 1 1 Fluctuating Fluctuating
# 12 2 2 1 Fluctuating Fluctuating
# 13 2 3 1 Fluctuating Fluctuating
# 14 2 4 1 Fluctuating Fluctuating
# 15 2 5 0 Fluctuating Fluctuating
# 16 2 6 -1 Fluctuating Fluctuating
# 17 2 7 -1 Fluctuating Fluctuating
# 18 2 8 1 Fluctuating Fluctuating
# 19 2 9 1 Fluctuating Fluctuating
# 20 2 10 1 Fluctuating Fluctuating
# 21 3 1 1 Not fluctuating Not Fluctuating
# 22 3 2 0 Not fluctuating Not Fluctuating
# 23 3 3 1 Not fluctuating Not Fluctuating
# 24 3 4 0 Not fluctuating Not Fluctuating
# 25 3 5 1 Not fluctuating Not Fluctuating
# 26 3 6 0 Not fluctuating Not Fluctuating
# 27 3 7 1 Not fluctuating Not Fluctuating
# 28 3 8 0 Not fluctuating Not Fluctuating
# 29 3 9 1 Not fluctuating Not Fluctuating
# 30 3 10 0 Not fluctuating Not Fluctuating
使用rle
函数和dplyr
库
x %>%
mutate(value_new = ifelse(value == -1, 0, value)) %>%
group_by(id) %>%
mutate(status = ifelse(all(rle(value_new)$lengths >= 3), "Fluctuating", "Not fluctuating")) %>%
select(-value_new)
输出
# A tibble: 30 x 4
# Groups: id [3]
id serial value status
<dbl> <int> <dbl> <chr>
1 1 1 1 Fluctuating
2 1 2 1 Fluctuating
3 1 3 1 Fluctuating
4 1 4 1 Fluctuating
5 1 5 0 Fluctuating
6 1 6 0 Fluctuating
7 1 7 0 Fluctuating
8 1 8 1 Fluctuating
9 1 9 1 Fluctuating
10 1 10 1 Fluctuating
11 2 1 1 Fluctuating
12 2 2 1 Fluctuating
13 2 3 1 Fluctuating
14 2 4 1 Fluctuating
15 2 5 0 Fluctuating
16 2 6 -1 Fluctuating
17 2 7 -1 Fluctuating
18 2 8 1 Fluctuating
19 2 9 1 Fluctuating
20 2 10 1 Fluctuating
21 3 1 1 Not fluctuating
22 3 2 0 Not fluctuating
23 3 3 1 Not fluctuating
24 3 4 0 Not fluctuating
25 3 5 1 Not fluctuating
26 3 6 0 Not fluctuating
27 3 7 1 Not fluctuating
28 3 8 0 Not fluctuating
29 3 9 1 Not fluctuating
30 3 10 0 Not fluctuating
我有一个数据框,其中包含如下所示的 id 和 value 列,但我想根据 value 列中的值按 id 组确定 Status 列。
x <- data.frame(id = c(rep(1,10), rep(2,10), rep(3,10)),
serial = rep(1:10,3),
value = c(rep(1,4), rep(0,3), rep(1,3),
rep(1,4), rep(0,1), rep(-1,2), rep(1,3),
rep(c(1,0),5)),
status = c(rep("Fluctuating", 10),
rep("Fluctuating", 10),
rep("Not fluctuating", 10)))
id serial value status
1 1 1 1 Fluctuating
2 1 2 1 Fluctuating
3 1 3 1 Fluctuating
4 1 4 1 Fluctuating
5 1 5 0 Fluctuating
6 1 6 0 Fluctuating
7 1 7 0 Fluctuating
8 1 8 1 Fluctuating
9 1 9 1 Fluctuating
10 1 10 1 Fluctuating
11 2 1 1 Fluctuating
12 2 2 1 Fluctuating
13 2 3 1 Fluctuating
14 2 4 1 Fluctuating
15 2 5 0 Fluctuating
16 2 6 -1 Fluctuating
17 2 7 -1 Fluctuating
18 2 8 1 Fluctuating
19 2 9 1 Fluctuating
20 2 10 1 Fluctuating
21 3 1 1 Not fluctuating
22 3 2 0 Not fluctuating
23 3 3 1 Not fluctuating
24 3 4 0 Not fluctuating
25 3 5 1 Not fluctuating
26 3 6 0 Not fluctuating
27 3 7 1 Not fluctuating
28 3 8 0 Not fluctuating
29 3 9 1 Not fluctuating
30 3 10 0 Not fluctuating
这里,如果三个或更多的1后面跟着3个或更多的(0或-1),然后再后面跟着3个或更多的1,则认为一个组是波动的。如果三个或更多交替的 0s-1s-0s,-1s-0s-1s 等也将被视为波动
想知道分配状态列的最佳方法是什么,最好使用 dplyr
?
谢谢!
library(dplyr)
# library(zoo) # rollapply
threes <- function(z, minlen = 3L, ptn = c(TRUE, FALSE, TRUE)) {
r <- rle(z > 0)
starts <- zoo::rollapply(r$lengths >= minlen, minlen, all, fill = FALSE, align = "left")
for (st in which(starts)) {
if (all(r$values[st + seq_len(minlen) - 1L] == ptn)) return(TRUE)
}
return(FALSE)
}
x %>%
group_by(id) %>%
mutate(status2 = paste0(if (threes(value)) "" else "Not ", "Fluctuating")) %>%
ungroup() %>%
print(n = 99)
# # A tibble: 30 x 5
# id serial value status status2
# <dbl> <int> <dbl> <chr> <chr>
# 1 1 1 1 Fluctuating Fluctuating
# 2 1 2 1 Fluctuating Fluctuating
# 3 1 3 1 Fluctuating Fluctuating
# 4 1 4 1 Fluctuating Fluctuating
# 5 1 5 0 Fluctuating Fluctuating
# 6 1 6 0 Fluctuating Fluctuating
# 7 1 7 0 Fluctuating Fluctuating
# 8 1 8 1 Fluctuating Fluctuating
# 9 1 9 1 Fluctuating Fluctuating
# 10 1 10 1 Fluctuating Fluctuating
# 11 2 1 1 Fluctuating Fluctuating
# 12 2 2 1 Fluctuating Fluctuating
# 13 2 3 1 Fluctuating Fluctuating
# 14 2 4 1 Fluctuating Fluctuating
# 15 2 5 0 Fluctuating Fluctuating
# 16 2 6 -1 Fluctuating Fluctuating
# 17 2 7 -1 Fluctuating Fluctuating
# 18 2 8 1 Fluctuating Fluctuating
# 19 2 9 1 Fluctuating Fluctuating
# 20 2 10 1 Fluctuating Fluctuating
# 21 3 1 1 Not fluctuating Not Fluctuating
# 22 3 2 0 Not fluctuating Not Fluctuating
# 23 3 3 1 Not fluctuating Not Fluctuating
# 24 3 4 0 Not fluctuating Not Fluctuating
# 25 3 5 1 Not fluctuating Not Fluctuating
# 26 3 6 0 Not fluctuating Not Fluctuating
# 27 3 7 1 Not fluctuating Not Fluctuating
# 28 3 8 0 Not fluctuating Not Fluctuating
# 29 3 9 1 Not fluctuating Not Fluctuating
# 30 3 10 0 Not fluctuating Not Fluctuating
使用rle
函数和dplyr
库
x %>%
mutate(value_new = ifelse(value == -1, 0, value)) %>%
group_by(id) %>%
mutate(status = ifelse(all(rle(value_new)$lengths >= 3), "Fluctuating", "Not fluctuating")) %>%
select(-value_new)
输出
# A tibble: 30 x 4
# Groups: id [3]
id serial value status
<dbl> <int> <dbl> <chr>
1 1 1 1 Fluctuating
2 1 2 1 Fluctuating
3 1 3 1 Fluctuating
4 1 4 1 Fluctuating
5 1 5 0 Fluctuating
6 1 6 0 Fluctuating
7 1 7 0 Fluctuating
8 1 8 1 Fluctuating
9 1 9 1 Fluctuating
10 1 10 1 Fluctuating
11 2 1 1 Fluctuating
12 2 2 1 Fluctuating
13 2 3 1 Fluctuating
14 2 4 1 Fluctuating
15 2 5 0 Fluctuating
16 2 6 -1 Fluctuating
17 2 7 -1 Fluctuating
18 2 8 1 Fluctuating
19 2 9 1 Fluctuating
20 2 10 1 Fluctuating
21 3 1 1 Not fluctuating
22 3 2 0 Not fluctuating
23 3 3 1 Not fluctuating
24 3 4 0 Not fluctuating
25 3 5 1 Not fluctuating
26 3 6 0 Not fluctuating
27 3 7 1 Not fluctuating
28 3 8 0 Not fluctuating
29 3 9 1 Not fluctuating
30 3 10 0 Not fluctuating