在 R 中的数据重组期间错误分配标志变量

flag variable is incorrectly assigned during data restructuring in R

我使用的这些数据。

datatrain=structure(list(probeg = c(10000L, 20000L, 30000L, 40000L, 50000L, 
60000L, 70000L, 80000L, 90000L, 100000L, 110000L, 120000L, 130000L, 
140000L, 150000L, 160000L, 170000L, 180000L, 190000L, 200000L, 
210000L, 220000L, 230000L, 240000L, 250000L, 260000L, 270000L, 
280000L, 290000L, 300000L, 310000L, 320000L, 330000L, 340000L, 
350000L, 360000L, 370000L, 380000L, 390000L, 400000L, 410000L, 
420000L, 430000L, 440000L, 450000L, 460000L, 470000L, 480000L, 
490000L, 500000L, 510000L, 520000L, 530000L, 540000L, 550000L, 
560000L, 570000L, 580000L, 590000L, 600000L, 610000L, 620000L, 
630000L, 640000L, 650000L, 660000L, 670000L, 680000L), EP_OBJECTID = c(88679804L, 
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 9000L, 
9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 
9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 
9000L), TU17L4 = c(80, 80.5, 80.5, 80.5, 80.5, 80.5, 80.5, 80.5, 
80, 80, 80, 80, 79.5, 79.5, 79.5, 79.5, 79.5, 79.5, 73, 73, 73, 
73, 73, 72, 72, 72, 70.5, 70.5, 70.5, 70.5, 70.5, 70, 70.5, 67, 
67, 67, 67, 61.5, 61.5, 61.5, 61.5, 61.5, 61.5, 61.5, 61.5, 61.5, 
61.5, 57.5, 56, 57.5, 57.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 
56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5
), DELTTBL = c(12.5, 12.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 11.5, 1, 1, 1, 0, 1.5, 1.5, 0, 0, 0, 0, 0.5, 
0.5, 0, 0, 0, 0, 0, 0, 0, 11.5, 0, 0, 0, 0, 4, 4, 4, 1.5, 4, 
1.5, 1.5, 1.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    TU17R4 = c(80, 79.5, 79.5, 79.5, 79.5, 79, 78.5, 78.5, 78.5, 
    78.5, 78.5, 78, 78, 78, 78, 78, 78, 78, 72, 72, 72, 72, 72, 
    71, 71, 71, 69.5, 69.5, 69.5, 69.5, 69.5, 69.5, 69.5, 66, 
    66, 66, 66, 61, 61, 61, 61, 60.5, 60.5, 60.5, 60.5, 60.5, 
    60.5, 57, 56, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 
    57, 57, 57, 56.5, 56.5, 56.5, 56.5, 56.5), DELTTBR = c(12.5, 
    12.5, 0, 0, 0, 0.5, 0.5, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0, 0, 
    0, 0, 0, 0, 11, 1, 1, 1, 0, 1.5, 1.5, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 11, 0.5, 0.5, 0.5, 0, 3.5, 3.5, 3.5, 1, 
    3.5, 1, 1, 1, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 
    0.5, 0.5, 0.5, 0.5), TU17L5 = c(1060L, 1054L, 1054L, 1054L, 
    1054L, 1054L, 1054L, 1054L, 1053L, 1053L, 1053L, 1053L, 1052L, 
    1052L, 1052L, 1052L, 1052L, 1052L, 1038L, 1038L, 1038L, 1038L, 
    1038L, 1036L, 1036L, 1036L, 1033L, 1033L, 1033L, 1041L, 1033L, 
    1032L, 1033L, 1026L, 1026L, 1026L, 1026L, 1017L, 1017L, 1017L, 
    1017L, 1017L, 1017L, 1017L, 1017L, 1017L, 1017L, 1009L, 1028L, 
    1009L, 1009L, 1007L, 1007L, 1007L, 1014L, 1007L, 1007L, 1007L, 
    1014L, 1007L, 1007L, 1007L, 1007L, 1007L, 1007L, 1007L, 1007L, 
    1007L), DELTDML = c(33L, 33L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 21L, 2L, 2L, 
    2L, 0L, 3L, 3L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 21L, 0L, 0L, 0L, 0L, 8L, 8L, 8L, 21L, 8L, 21L, 21L, 
    21L, 0L, 7L, 7L, 7L, 7L, 7L, 0L, 0L, 0L, 0L, 0L, 0L, 7L, 
    7L, 7L), TU17R5 = c(1060L, 1054L, 1054L, 1054L, 1054L, 1053L, 
    1052L, 1052L, 1052L, 1052L, 1052L, 1051L, 1051L, 1051L, 1051L, 
    1051L, 1051L, 1051L, 1038L, 1038L, 1038L, 1038L, 1038L, 1036L, 
    1036L, 1036L, 1033L, 1033L, 1033L, 1040L, 1033L, 1033L, 1033L, 
    1026L, 1026L, 1026L, 1026L, 1017L, 1017L, 1017L, 1017L, 1016L, 
    1016L, 1016L, 1016L, 1016L, 1016L, 1009L, 1028L, 1009L, 1009L, 
    1009L, 1009L, 1009L, 1014L, 1009L, 1009L, 1009L, 1014L, 1009L, 
    1009L, 1009L, 1009L, 1008L, 1008L, 1008L, 1008L, 1008L), 
    DELTDMR = c(31L, 31L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 
    1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 21L, 2L, 2L, 2L, 0L, 
    3L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    21L, 1L, 1L, 1L, 0L, 7L, 7L, 7L, 19L, 7L, 19L, 19L, 19L, 
    0L, 5L, 5L, 5L, 5L, 6L, 1L, 1L, 1L, 1L, 1L, 1L, 6L, 6L, 6L
    ), TU17L2 = c(27, 27, 27, 30, 26.5, 26.5, 26, 26, 26.5, 26, 
    25, 27, 27, 26.5, 26.5, 26.5, 26, 30, 27, 26, 26.5, 26.5, 
    26.5, 26.5, 26, 31, 31, 30, 30, 30, 29, 30, 30, 29, 30, 28, 
    28.5, 28.5, 28.5, 28.5, 28.5, 28, 30, 30, 30, 30, 28.5, 29, 
    28, 27, 27, 27, 26.5, 28, 28, 28, 28.5, 27, 28.5, 27, 27, 
    27, 27, 27, 26, 27, 26, 26), DELTTGRL = c(0, 0, 3.5, 3.5, 
    3.5, 3.5, 0, 0, 0, 0, 2, 2, 2, 2, 2, 0, 4, 4, 4, 0.5, 0, 
    0.5, 0, 5, 5, 5, 5, 5, 5, 5, 1, 1, 1, 0, 2, 2, 0, 0, 0, 0, 
    0, 2, 2, 2, 2, 1.5, 1.5, 0, 2, 2, 0.5, 0.5, 0.5, 0, 0, 0.5, 
    0.5, 1.5, 1.5, 0, 0, 0, 0, 1, 1, 1, 1, 2), TU17R2 = c(29, 
    28.5, 28.5, 30, 28, 28, 28, 28, 28, 28, 27, 28, 28, 27, 27, 
    27, 27.5, 30, 28, 27, 26, 27, 26, 25, 27, 30, 30, 30, 30, 
    30, 29, 30, 30, 30, 30, 29.5, 30.5, 30.5, 30.5, 30.5, 30.5, 
    29.5, 28, 28, 28, 28, 26.5, 29, 28, 28, 28, 28, 27, 28, 28, 
    28, 27.5, 28, 27.5, 27, 27, 26, 26, 26, 26.5, 26, 26, 26), 
    DELTTGRR = c(0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 1, 1, 1, 1, 1, 
    0, 2.5, 2.5, 2.5, 0, 0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 1, 1, 
    1, 0, 0.5, 0.5, 0, 0, 0, 0, 0, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 
    0, 2, 2, 1, 1, 1, 0, 0, 0.5, 0.5, 0.5, 0.5, 0, 1, 1, 1, 0.5, 
    0.5, 0, 0, 2), TU17L3 = c(8, 8, 8, 10, 8, 8, 8, 8, 7, 7, 
    7, 10, 10, 9, 9, 9, 9, 10, 7.5, 9, 8.5, 9, 8, 8, 8, 10, 10, 
    10, 10, 10, 10, 10, 10, 10.5, 10, 10, 10, 10, 10, 10, 10, 
    10, 10, 10, 10, 8, 10, 10, 8, 9, 9, 8, 9, 8, 8, 8, 8.5, 8, 
    8.5, 8, 8, 7, 7, 7, 8, 7, 7, 7), DELTCRGRL = c(0, 0, 2, 2, 
    2, 2, 0, 1, 0, 1, 3, 3, 3, 3, 3, 0, 2.5, 2.5, 2.5, 0, 0, 
    0, 0, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 2, 2, 0, 2, 2, 1, 1, 1, 0, 0, 0.5, 0.5, 0.5, 
    0.5, 0, 1, 1, 1, 1, 1, 0, 0, 2), TU17R3 = c(8, 8, 8, 10, 
    8, 8, 8, 8, 8, 8, 7.5, 10, 10, 9, 9, 9, 9, 10, 9, 9, 7, 9, 
    7, 7, 8, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
    10, 10, 10, 10, 10, 10, 10, 8, 9, 10, 8, 8, 8, 8, 8, 8, 8, 
    8, 8, 8, 8, 8, 8, 7, 7, 7, 8, 7, 7, 7), DELTCRGRR = c(0, 
    0, 2, 2, 2, 2, 0, 0, 0, 0, 2.5, 2.5, 2.5, 2.5, 2.5, 0, 1, 
    1, 1, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1)), class = "data.frame", row.names = c(NA, 
-68L))

我只需要处理数据。

TU17L4;TU17R4;TU17L5;TU17R5;TU17L2;TU17R2;TU17L3;TU17R3

如果这些变量中至少有 3 个的值同时变化了至少 2.5,则应设置 flag =1.And 如果 flag =1,则 probeg 变量中的下一个计数应从 10000 公里的范围开始。 例如变量 TU17L4 的行 19-20 更改为 79,5-73=6,5,TU17R4 78-72=6 和 TU17L3 10-7,5=2.5 所以在此之后 probeg 变量的行数必须来自 10000 例子

 probeg EP_OBJECTID TU17L4 DELTTBL TU17R4 DELTTBR TU17L5 DELTDML
1   10000    88679804   80.0    12.5   80.0    12.5   1060      33
2   20000    88679804   80.5    12.5   79.5    12.5   1054      33
3   30000    88679804   80.5     0.0   79.5     0.0   1054       0
4   40000    88679804   80.5     0.0   79.5     0.0   1054       0
5   50000    88679804   80.5     0.0   79.5     0.0   1054       0
6   60000    88679804   80.5     0.0   79.0     0.5   1054       0
7   70000    88679804   80.5     0.0   78.5     0.5   1054       0
8   80000    88679804   80.5     0.0   78.5     0.0   1054       0
9   90000    88679804   80.0     0.0   78.5     0.0   1053       0
10 100000    88679804   80.0     0.0   78.5     0.5   1053       0
11 110000    88679804   80.0     0.0   78.5     0.5   1053       0
12 120000    88679804   80.0     0.0   78.0     0.5   1053       0
13 130000    88679804   79.5     0.0   78.0     0.0   1052       0
14 140000    88679804   79.5     0.0   78.0     0.0   1052       0
15 150000    88679804   79.5     0.0   78.0     0.0   1052       0
16 160000    88679804   79.5     0.0   78.0     0.0   1052       0
17 170000    88679804   79.5     0.0   78.0     0.0   1052       0
18 180000    88679804   79.5     0.0   78.0     0.0   1052       0
19 190000    88679804   73.0     0.0   72.0     0.0   1038       0
**20  10000    88679804   73.0     0.0   72.0     0.0   1038       0
21  20000    88679804   73.0    11.5   72.0    11.0   1038      21
22  30000    88679804   73.0     1.0   72.0     1.0   1038       2**
   TU17R5 DELTDMR TU17L2 DELTTGRL TU17R2 DELTTGRR TU17L3 DELTCRGRL
1    1060      31   27.0      0.0   29.0      0.0    8.0       0.0
2    1054      31   27.0      0.0   28.5      0.0    8.0       0.0
3    1054       0   27.0      3.5   28.5      2.0    8.0       2.0
4    1054       0   30.0      3.5   30.0      2.0   10.0       2.0
5    1054       0   26.5      3.5   28.0      2.0    8.0       2.0
6    1053       1   26.5      3.5   28.0      2.0    8.0       2.0
7    1052       1   26.0      0.0   28.0      0.0    8.0       0.0
8    1052       0   26.0      0.0   28.0      0.0    8.0       1.0
9    1052       0   26.5      0.0   28.0      0.0    7.0       0.0
10   1052       1   26.0      0.0   28.0      0.0    7.0       1.0
11   1052       1   25.0      2.0   27.0      1.0    7.0       3.0
12   1051       1   27.0      2.0   28.0      1.0   10.0       3.0
13   1051       0   27.0      2.0   28.0      1.0   10.0       3.0
14   1051       0   26.5      2.0   27.0      1.0    9.0       3.0
15   1051       0   26.5      2.0   27.0      1.0    9.0       3.0
16   1051       0   26.5      0.0   27.0      0.0    9.0       0.0
17   1051       0   26.0      4.0   27.5      2.5    9.0       2.5
18   1051       0   30.0      4.0   30.0      2.5   10.0       2.5
19   1038       0   27.0      4.0   28.0      2.5    7.5       2.5
20   1038       0   26.0      0.5   27.0      0.0    9.0       0.0
21   1038      21   26.5      0.0   26.0      0.0    8.5       0.0
22   1038       2   26.5      0.5   27.0      0.0    9.0       0.0
   TU17R3 DELTCRGRR
1     8.0       0.0
2     8.0       0.0
3     8.0       2.0
4    10.0       2.0
5     8.0       2.0
6     8.0       2.0
7     8.0       0.0
8     8.0       0.0
9     8.0       0.0
10    8.0       0.0
11    7.5       2.5
12   10.0       2.5
13   10.0       2.5
14    9.0       2.5
15    9.0       2.5
16    9.0       0.0
17    9.0       1.0
18   10.0       1.0
19    9.0       1.0
20    9.0       0.0
21    7.0       0.0
22    9.0       0.0

并且这必须分别针对每个 EP_OBJECTID 类别。所以我这样做 第i个运行这部分代码

library(dplyr)

datatrain %>%
  filter(!(EP_OBJECTID != lag(EP_OBJECTID) & DELTDMR == lag(DELTDMR))) %>%
  group_by(EP_OBJECTID) %>%
  mutate(DELT = seq(10000, length.out = n(), by = 10000))

然后我运行第二步,这段代码

threshold <- 3
flags <- dt %>% 
         apply(., 2, diff) %>% 
         apply(., 1,  
                  function(x) 
           ifelse(length(x[abs(x) > threshold]) > 1, 
                     1, 
                     0))
dt$flag <- c(0, flags)
dt

但结果是错误的。 IE。不像我上面提供的那样。 我看到这个结果

   X probeg EP_OBJECTID TU17L4 DELTTBL TU17R4 DELTTBR TU17L5
1   1  20000    88679804   80.5    12.5   79.5    12.5   1054
2   2  30000    88679804   80.5     0.0   79.5     0.0   1054
3   3  40000    88679804   80.5     0.0   79.5     0.0   1054
4   4  50000    88679804   80.5     0.0   79.5     0.0   1054
5   5  60000    88679804   80.5     0.0   79.0     0.5   1054
6   6  70000    88679804   80.5     0.0   78.5     0.5   1054
7   7  80000    88679804   80.5     0.0   78.5     0.0   1054
8   8  90000    88679804   80.0     0.0   78.5     0.0   1053
9   9 100000    88679804   80.0     0.0   78.5     0.5   1053
10 10 110000    88679804   80.0     0.0   78.5     0.5   1053
   DELTDML TU17R5 DELTDMR TU17L2 DELTTGRL TU17R2 DELTTGRR TU17L3
1       33   1054      31   27.0      0.0   28.5        0      8
2        0   1054       0   27.0      3.5   28.5        2      8
3        0   1054       0   30.0      3.5   30.0        2     10
4        0   1054       0   26.5      3.5   28.0        2      8
5        0   1053       1   26.5      3.5   28.0        2      8
6        0   1052       1   26.0      0.0   28.0        0      8
7        0   1052       0   26.0      0.0   28.0        0      8
8        0   1052       0   26.5      0.0   28.0        0      7
9        0   1052       1   26.0      0.0   28.0        0      7
10       0   1052       1   25.0      2.0   27.0        1      7
   DELTCRGRL TU17R3 DELTCRGRR  DELT flag
1          0    8.0       0.0 1e+04    0
2          2    8.0       2.0 2e+04    1
3          2   10.0       2.0 3e+04    1
4          2    8.0       2.0 4e+04    1
5          2    8.0       2.0 5e+04    1
6          0    8.0       0.0 6e+04    1
7          1    8.0       0.0 7e+04    1
8          0    8.0       0.0 8e+04    1
9          1    8.0       0.0 9e+04    1
10         3    7.5       2.5 1e+05    1

这是错误的。我怎样才能得到我上面提供的结果。

我认为这种 data.table 方法应该有效..请告诉我..

library( data.table )
setDT( datatrain )

#first, we split by EP_OBJECTID
# loop over the split list, 
L <- lapply( split( datatrain, by = "EP_OBJECTID" ), function(dt) {
  # find rows where the diff of TU columns is >=2.5 in 3 columns or more
  dt[ shift( rowSums( dt[, lapply(.SD, function(x) abs( x - shift(x, type = "lag") ) ), 
                         .SDcols = patterns("^TU") ] >= 2.5 ) >=3, type = "lag" ),
             probeg := 10000 ]
  #first row of a group is always 10000
  dt[ 1, probeg := 10000 ]
  #set new value of probeg for all rows
  dt[, probeg := seq_len(.N) * 10000, by = .(cumsum( probeg == 10000 ) ) ]
  return(dt)
})
#bind the split list back together to a single data.table
ans <- rbindlist( L, use.names = TRUE )