根据条件创建序列计数器

Question

我有这样的数据集，

       x time 
1   TRUE    9       
2   TRUE    8       
3   TRUE   10      
4   TRUE    5       
5   TRUE   16       
6  FALSE    2       
7  FALSE   17       
8  FALSE    6   
9   TRUE   11       
10  TRUE    7       
11  TRUE   20       
12  TRUE    3       
13  TRUE   10       
14 FALSE    4       
15 FALSE    2       
16 FALSE   10  
17  TRUE    3       
18  TRUE    6

我想使用 r 生成一个新变量，为基于 x 和时间的条件变化分配一个唯一编号。具体来说，我想从头开始搜索数据并为第一行分配一个数字（例如，1）。当 x 的条件在 TRUE 和 False 之间变化时，数字会增加。在“x 为假”条件下，数字将保持不变。但在“x为真”条件下，“x为真且时间<10”时数字不变，但当满足“x为真且时间>=10”时数字也会加1 "，并保持不变，直到满足下一个变化条件。

换句话说，TRUE和False之间的变化被认为是条件变化。另外，当“x为TRUE”时，每次time > 10时也视为新条件的开始。

我想要得到的输出是这样的。

       x time   count
1   TRUE   9       1
2   TRUE   8       1
3   TRUE   10      2
4   TRUE   5       2
5   TRUE   16      3
6  FALSE    2      4 
7  FALSE   17      4 
8  FALSE    6      4
9   TRUE   11      5 
10  TRUE    7      5 
11  TRUE   20      6 
12  TRUE    3      6 
13  TRUE    9      6 
14 FALSE    4      7 
15 FALSE    2      7 
16 FALSE   10      7
17  TRUE    3      8 
18  TRUE    6      8
19  TRUE    15     9

我试过 rleid(x) 但它肯定没有考虑 time 变量的变化。对于如何在 r 中解决此问题的任何建议，我将不胜感激！

Answer 1

这是 rleid 的一个选项 - 在 'x' 列上使用 rleid 并根据 'time' 列

创建数字索引

library(data.table)
setDT(df1)[, count := rleid(x, replace(x, x, cumsum(time[x] >= 10)))]

-输出

        x  time count
    <lgcl> <int> <int>
 1:   TRUE     9     1
 2:   TRUE     8     1
 3:   TRUE    10     2
 4:   TRUE     5     2
 5:   TRUE    16     3
 6:  FALSE     2     4
 7:  FALSE    17     4
 8:  FALSE     6     4
 9:   TRUE    11     5
10:   TRUE     7     5
11:   TRUE    20     6
12:   TRUE     3     6
13:   TRUE     9     6
14:  FALSE     4     7
15:  FALSE     2     7
16:  FALSE    10     7
17:   TRUE     3     8
18:   TRUE     6     8
19:   TRUE    15     9

或 dplyr

library(dplyr)
df1 %>% 
   mutate(count = rleid(x, replace(x, x, cumsum(time[x] >= 10))))

-输出

       x time count
1   TRUE    9     1
2   TRUE    8     1
3   TRUE   10     2
4   TRUE    5     2
5   TRUE   16     3
6  FALSE    2     4
7  FALSE   17     4
8  FALSE    6     4
9   TRUE   11     5
10  TRUE    7     5
11  TRUE   20     6
12  TRUE    3     6
13  TRUE    9     6
14 FALSE    4     7
15 FALSE    2     7
16 FALSE   10     7
17  TRUE    3     8
18  TRUE    6     8
19  TRUE   15     9

数据

df1 <- structure(list(x = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, 
FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, 
TRUE, TRUE), time = c(9L, 8L, 10L, 5L, 16L, 2L, 17L, 6L, 11L, 
7L, 20L, 3L, 9L, 4L, 2L, 10L, 3L, 6L, 15L)), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19"), class = "data.frame")

Answer 2

您可以在基础 R 中使用 for 循环。

# Your data, copied from @akrun
df1 <- structure(list(x = c(TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, 
                            FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, 
                            TRUE, TRUE), time = c(9L, 8L, 10L, 5L, 16L, 2L, 17L, 6L, 11L, 
                                                  7L, 20L, 3L, 9L, 4L, 2L, 10L, 3L, 6L, 15L)), row.names = c("1", 
                                                                                                             "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
                                                                                                             "14", "15", "16", "17", "18", "19"), class = "data.frame")

# Create an empty `count` column 

df1$count <- 0

# Assign 1 to the first row 

df1$count[1] <- 1

# From the 2nd row up to the last row, increase the count number if 
# one two #conditions is satisfied. Otherwise, the count number should 
# remain unchanged.

for (k in 2:nrow(df1)) {
      # The two conditions for increase of count number:
      # (1)there is a change in x    OR     (2) x is TRUE and time >=10

  if( df1$x[k] != df1$x[k-1] | (df1$x[k] == TRUE & df1$time[k] >= 10)){
    df1$count[k] <- df1$count[k-1] + 1
  }
  else df1$count[k] <- df1$count[k-1]
}

df1
       x time count
1   TRUE    9     1
2   TRUE    8     1
3   TRUE   10     2
4   TRUE    5     2
5   TRUE   16     3
6  FALSE    2     4
7  FALSE   17     4
8  FALSE    6     4
9   TRUE   11     5
10  TRUE    7     5
11  TRUE   20     6
12  TRUE    3     6
13  TRUE    9     6
14 FALSE    4     7
15 FALSE    2     7
16 FALSE   10     7
17  TRUE    3     8
18  TRUE    6     8
19  TRUE   15     9

根据条件创建序列计数器

Create sequence counter based on condition

counter

r

sequence

数据