如何根据列中特定的值序列在 df 中标记组

How to label groups in df based on specific sequence of values in a column

我有一个数据框,其中包含如下所示的 id 和 value 列,但我想根据 value 列中的值按 id 组确定 Status 列。

  
x <- data.frame(id = c(rep(1,10), rep(2,10), rep(3,10)),
                serial = rep(1:10,3),
                value = c(rep(1,4), rep(0,3), rep(1,3),
                          rep(1,4), rep(0,1), rep(-1,2), rep(1,3),
                          rep(c(1,0),5)),
                status = c(rep("Fluctuating", 10),
                           rep("Fluctuating", 10),
                           rep("Not fluctuating", 10)))
   id serial value          status
1   1      1     1     Fluctuating
2   1      2     1     Fluctuating
3   1      3     1     Fluctuating
4   1      4     1     Fluctuating
5   1      5     0     Fluctuating
6   1      6     0     Fluctuating
7   1      7     0     Fluctuating
8   1      8     1     Fluctuating
9   1      9     1     Fluctuating
10  1     10     1     Fluctuating
11  2      1     1     Fluctuating
12  2      2     1     Fluctuating
13  2      3     1     Fluctuating
14  2      4     1     Fluctuating
15  2      5     0     Fluctuating
16  2      6    -1     Fluctuating
17  2      7    -1     Fluctuating
18  2      8     1     Fluctuating
19  2      9     1     Fluctuating
20  2     10     1     Fluctuating
21  3      1     1 Not fluctuating
22  3      2     0 Not fluctuating
23  3      3     1 Not fluctuating
24  3      4     0 Not fluctuating
25  3      5     1 Not fluctuating
26  3      6     0 Not fluctuating
27  3      7     1 Not fluctuating
28  3      8     0 Not fluctuating
29  3      9     1 Not fluctuating
30  3     10     0 Not fluctuating

这里,如果三个或更多的1后面跟着3个或更多的(0或-1),然后再后面跟着3个或更多的1,则认为一个组是波动的。如果三个或更多交替的 0s-1s-0s,-1s-0s-1s 等也将被视为波动

想知道分配状态列的最佳方法是什么,最好使用 dplyr

谢谢!

library(dplyr)
# library(zoo) # rollapply
threes <- function(z, minlen = 3L, ptn = c(TRUE, FALSE, TRUE)) {
  r <- rle(z > 0)
  starts <- zoo::rollapply(r$lengths >= minlen, minlen, all, fill = FALSE, align = "left")
  for (st in which(starts)) {
    if (all(r$values[st + seq_len(minlen) - 1L] == ptn)) return(TRUE)
  }
  return(FALSE)
}

x %>%
  group_by(id) %>%
  mutate(status2 = paste0(if (threes(value)) "" else "Not ", "Fluctuating")) %>%
  ungroup() %>%
  print(n = 99)
# # A tibble: 30 x 5
#       id serial value status          status2        
#    <dbl>  <int> <dbl> <chr>           <chr>          
#  1     1      1     1 Fluctuating     Fluctuating    
#  2     1      2     1 Fluctuating     Fluctuating    
#  3     1      3     1 Fluctuating     Fluctuating    
#  4     1      4     1 Fluctuating     Fluctuating    
#  5     1      5     0 Fluctuating     Fluctuating    
#  6     1      6     0 Fluctuating     Fluctuating    
#  7     1      7     0 Fluctuating     Fluctuating    
#  8     1      8     1 Fluctuating     Fluctuating    
#  9     1      9     1 Fluctuating     Fluctuating    
# 10     1     10     1 Fluctuating     Fluctuating    
# 11     2      1     1 Fluctuating     Fluctuating    
# 12     2      2     1 Fluctuating     Fluctuating    
# 13     2      3     1 Fluctuating     Fluctuating    
# 14     2      4     1 Fluctuating     Fluctuating    
# 15     2      5     0 Fluctuating     Fluctuating    
# 16     2      6    -1 Fluctuating     Fluctuating    
# 17     2      7    -1 Fluctuating     Fluctuating    
# 18     2      8     1 Fluctuating     Fluctuating    
# 19     2      9     1 Fluctuating     Fluctuating    
# 20     2     10     1 Fluctuating     Fluctuating    
# 21     3      1     1 Not fluctuating Not Fluctuating
# 22     3      2     0 Not fluctuating Not Fluctuating
# 23     3      3     1 Not fluctuating Not Fluctuating
# 24     3      4     0 Not fluctuating Not Fluctuating
# 25     3      5     1 Not fluctuating Not Fluctuating
# 26     3      6     0 Not fluctuating Not Fluctuating
# 27     3      7     1 Not fluctuating Not Fluctuating
# 28     3      8     0 Not fluctuating Not Fluctuating
# 29     3      9     1 Not fluctuating Not Fluctuating
# 30     3     10     0 Not fluctuating Not Fluctuating

使用rle函数和dplyr

x %>% 
  mutate(value_new = ifelse(value == -1, 0, value)) %>% 
  group_by(id) %>% 
  mutate(status = ifelse(all(rle(value_new)$lengths >= 3), "Fluctuating", "Not fluctuating")) %>% 
  select(-value_new) 

输出

# A tibble: 30 x 4
# Groups:   id [3]
      id serial value status         
   <dbl>  <int> <dbl> <chr>          
 1     1      1     1 Fluctuating    
 2     1      2     1 Fluctuating    
 3     1      3     1 Fluctuating    
 4     1      4     1 Fluctuating    
 5     1      5     0 Fluctuating    
 6     1      6     0 Fluctuating    
 7     1      7     0 Fluctuating    
 8     1      8     1 Fluctuating    
 9     1      9     1 Fluctuating    
10     1     10     1 Fluctuating    
11     2      1     1 Fluctuating    
12     2      2     1 Fluctuating    
13     2      3     1 Fluctuating    
14     2      4     1 Fluctuating    
15     2      5     0 Fluctuating    
16     2      6    -1 Fluctuating    
17     2      7    -1 Fluctuating    
18     2      8     1 Fluctuating    
19     2      9     1 Fluctuating    
20     2     10     1 Fluctuating    
21     3      1     1 Not fluctuating
22     3      2     0 Not fluctuating
23     3      3     1 Not fluctuating
24     3      4     0 Not fluctuating
25     3      5     1 Not fluctuating
26     3      6     0 Not fluctuating
27     3      7     1 Not fluctuating
28     3      8     0 Not fluctuating
29     3      9     1 Not fluctuating
30     3     10     0 Not fluctuating