如何根据其先前值的减少百分比将 vector/variable 分隔为两种状态

How to segregate a vector/variable in two states depending upon percent increase of decrease from its previous value

假设我有一个向量 A(或数据框中的变量 A,比如 df)具有以下值

A <- c(90L, 100L, 5L, 15L, 16L, 2L, 20L, 25L, 2L, 40L, 16L, 16L, 32L, 51L, 52L)

A
#>  [1]  90 100   5  15  16   2  20  25   2  40  16  16  32  51  52

df <- data.frame(A = A)

reprex package (v2.0.0)

于 2021-05-11 创建

现在我想根据以下条件将这些值分成两个状态,比如 01

所以基本上我正在尝试获得这样的输出

     A State
1   90     0
2  100     0
3    5     1
4   15     1
5   16     1
6    2     1
7   20     1
8   25     1
9    2     1
10  40     1
11  16     1
12  16     1
13  32     1
14  51     0
15  52     0

BaseR 或 tidyverse 方法对我来说很好。

我被卡住的地方实际上是检索阈值(100 或 50%)直到第 14 行,因为你可以看到它进一步下降了 80% 两次,一次在第 6 行和第 9 行。

可以多一个测试用例

     A State
1   90     0
2  100     0
3    5     1
4   15     1
5   16     1
6    2     1
7   20     1
8   25     1
9    2     1
10  40     1
11  16     1
12  16     1
13  32     1
14  51     0
15  52     0
16  60     0
17  10     1
18  20     1
19   5     1
20  30     1
21  31     0
22  50     0
23 100     0

说明


df <- data.frame(A = sample(1:100, 100000, T))

上对两个答案进行基准测试
Unit: microseconds
       expr      min        lq       mean    median       uq      max neval
 BlueVoxe() 214547.5 271455.60 298551.084 300763.90 309823.3 499692.5   100
      Ben()      4.2      4.85      7.115      5.35      9.8     11.3   100

A​​nil,也许这可能有助于推进。它不优雅。您可以创建一个循环来检查值的变化,同时跟踪先前的阈值。

A <- c(90, 100, 5, 15, 16, 2, 20, 25, 2, 40, 16, 16, 32, 
       51, 52, 60, 10, 20, 5, 30, 31, 50, 100)

threshold <- NA
State <- 0

for (i in 2:length(A)) {
  if (State[i-1] == 0) {
    if ((A[i-1] - A[i]) > (.8 * A[i-1])) {
      threshold <- A[i-1]
      State <- c(State, 1)
    } else {
      State <- c(State, 0)
    }
  } else {
    if (A[i] > (.5 * threshold)) { 
      State <- c(State, 0)
    } else {
      State <- c(State, 1)
    }
  }
}

data.frame(A, State)

输出

     A State
1   90     0
2  100     0
3    5     1
4   15     1
5   16     1
6    2     1
7   20     1
8   25     1
9    2     1
10  40     1
11  16     1
12  16     1
13  32     1
14  51     0
15  52     0
16  60     0
17  10     1
18  20     1
19   5     1
20  30     1
21  31     0
22  50     0
23 100     0

数据

A <- c(90, 100, 5, 15, 16, 2, 20, 25, 2, 40, 16, 16, 32, 51, 52, 60, 
10, 20, 5, 30, 31, 50, 100)

这是一个使用 purrr::accumulate() 的函数,应该可以解决问题:

library(dplyr)
library(purrr)

A <- c(90L, 100L, 5L, 15L, 16L, 2L, 20L, 25L, 2L, 40L, 16L, 16L, 32L, 51L, 52L)
df <- data.frame(A = A)

trans <- function(x, trigger_0 = 0.5, trigger_1 = -0.8) {
  
  # This variable 'remembers' the last value that switched 
  # the indicator to 1
  compare <- 0
  
  out <- purrr::accumulate(seq_along(x), .init = 0, function(prev_result, i) {
    
    # Initial value should be 0 - this will return .init
    if (i == 1) {
      return(0)
    }
    
    # If previous result is 0 we only need to check that change from
    # prev value is less than `trigger_1`
    if (prev_result == 0) {
      
      # Compute the change from previous value
      change1 <- (x[i] - x[i - 1]) / x[i - 1]
      
      if (change1 < trigger_1) {
        
        # Reset 'compare' to be used in next iterations
        compare <<- x[i - 1]
        return(1)
        
      } else return(0)
      
    }
    
    if (prev_result == 1) {
      
      # Compute change from 'compare'
      change2 <- x[i] / compare
      
      # Return 1 or 0 based on the increase/decrease from 'compare'
      if (change2 > trigger_0) {
        return(0)
      } else return(1)
    }
    
  })
  
  # Remove the leading 0 created by using `init`
  tail(out, -1)
   
}

df %>% 
  mutate(Indicator = trans(A))
#>      A Indicator
#> 1   90         0
#> 2  100         0
#> 3    5         1
#> 4   15         1
#> 5   16         1
#> 6    2         1
#> 7   20         1
#> 8   25         1
#> 9    2         1
#> 10  40         1
#> 11  16         1
#> 12  16         1
#> 13  32         1
#> 14  51         0
#> 15  52         0