在 R 中的数据重组期间错误分配标志变量
flag variable is incorrectly assigned during data restructuring in R
我使用的这些数据。
datatrain=structure(list(probeg = c(10000L, 20000L, 30000L, 40000L, 50000L,
60000L, 70000L, 80000L, 90000L, 100000L, 110000L, 120000L, 130000L,
140000L, 150000L, 160000L, 170000L, 180000L, 190000L, 200000L,
210000L, 220000L, 230000L, 240000L, 250000L, 260000L, 270000L,
280000L, 290000L, 300000L, 310000L, 320000L, 330000L, 340000L,
350000L, 360000L, 370000L, 380000L, 390000L, 400000L, 410000L,
420000L, 430000L, 440000L, 450000L, 460000L, 470000L, 480000L,
490000L, 500000L, 510000L, 520000L, 530000L, 540000L, 550000L,
560000L, 570000L, 580000L, 590000L, 600000L, 610000L, 620000L,
630000L, 640000L, 650000L, 660000L, 670000L, 680000L), EP_OBJECTID = c(88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 9000L,
9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L,
9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L,
9000L), TU17L4 = c(80, 80.5, 80.5, 80.5, 80.5, 80.5, 80.5, 80.5,
80, 80, 80, 80, 79.5, 79.5, 79.5, 79.5, 79.5, 79.5, 73, 73, 73,
73, 73, 72, 72, 72, 70.5, 70.5, 70.5, 70.5, 70.5, 70, 70.5, 67,
67, 67, 67, 61.5, 61.5, 61.5, 61.5, 61.5, 61.5, 61.5, 61.5, 61.5,
61.5, 57.5, 56, 57.5, 57.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5,
56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5
), DELTTBL = c(12.5, 12.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 11.5, 1, 1, 1, 0, 1.5, 1.5, 0, 0, 0, 0, 0.5,
0.5, 0, 0, 0, 0, 0, 0, 0, 11.5, 0, 0, 0, 0, 4, 4, 4, 1.5, 4,
1.5, 1.5, 1.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
TU17R4 = c(80, 79.5, 79.5, 79.5, 79.5, 79, 78.5, 78.5, 78.5,
78.5, 78.5, 78, 78, 78, 78, 78, 78, 78, 72, 72, 72, 72, 72,
71, 71, 71, 69.5, 69.5, 69.5, 69.5, 69.5, 69.5, 69.5, 66,
66, 66, 66, 61, 61, 61, 61, 60.5, 60.5, 60.5, 60.5, 60.5,
60.5, 57, 56, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57,
57, 57, 57, 56.5, 56.5, 56.5, 56.5, 56.5), DELTTBR = c(12.5,
12.5, 0, 0, 0, 0.5, 0.5, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0, 0,
0, 0, 0, 0, 11, 1, 1, 1, 0, 1.5, 1.5, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 11, 0.5, 0.5, 0.5, 0, 3.5, 3.5, 3.5, 1,
3.5, 1, 1, 1, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5), TU17L5 = c(1060L, 1054L, 1054L, 1054L,
1054L, 1054L, 1054L, 1054L, 1053L, 1053L, 1053L, 1053L, 1052L,
1052L, 1052L, 1052L, 1052L, 1052L, 1038L, 1038L, 1038L, 1038L,
1038L, 1036L, 1036L, 1036L, 1033L, 1033L, 1033L, 1041L, 1033L,
1032L, 1033L, 1026L, 1026L, 1026L, 1026L, 1017L, 1017L, 1017L,
1017L, 1017L, 1017L, 1017L, 1017L, 1017L, 1017L, 1009L, 1028L,
1009L, 1009L, 1007L, 1007L, 1007L, 1014L, 1007L, 1007L, 1007L,
1014L, 1007L, 1007L, 1007L, 1007L, 1007L, 1007L, 1007L, 1007L,
1007L), DELTDML = c(33L, 33L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 21L, 2L, 2L,
2L, 0L, 3L, 3L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 21L, 0L, 0L, 0L, 0L, 8L, 8L, 8L, 21L, 8L, 21L, 21L,
21L, 0L, 7L, 7L, 7L, 7L, 7L, 0L, 0L, 0L, 0L, 0L, 0L, 7L,
7L, 7L), TU17R5 = c(1060L, 1054L, 1054L, 1054L, 1054L, 1053L,
1052L, 1052L, 1052L, 1052L, 1052L, 1051L, 1051L, 1051L, 1051L,
1051L, 1051L, 1051L, 1038L, 1038L, 1038L, 1038L, 1038L, 1036L,
1036L, 1036L, 1033L, 1033L, 1033L, 1040L, 1033L, 1033L, 1033L,
1026L, 1026L, 1026L, 1026L, 1017L, 1017L, 1017L, 1017L, 1016L,
1016L, 1016L, 1016L, 1016L, 1016L, 1009L, 1028L, 1009L, 1009L,
1009L, 1009L, 1009L, 1014L, 1009L, 1009L, 1009L, 1014L, 1009L,
1009L, 1009L, 1009L, 1008L, 1008L, 1008L, 1008L, 1008L),
DELTDMR = c(31L, 31L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 21L, 2L, 2L, 2L, 0L,
3L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
21L, 1L, 1L, 1L, 0L, 7L, 7L, 7L, 19L, 7L, 19L, 19L, 19L,
0L, 5L, 5L, 5L, 5L, 6L, 1L, 1L, 1L, 1L, 1L, 1L, 6L, 6L, 6L
), TU17L2 = c(27, 27, 27, 30, 26.5, 26.5, 26, 26, 26.5, 26,
25, 27, 27, 26.5, 26.5, 26.5, 26, 30, 27, 26, 26.5, 26.5,
26.5, 26.5, 26, 31, 31, 30, 30, 30, 29, 30, 30, 29, 30, 28,
28.5, 28.5, 28.5, 28.5, 28.5, 28, 30, 30, 30, 30, 28.5, 29,
28, 27, 27, 27, 26.5, 28, 28, 28, 28.5, 27, 28.5, 27, 27,
27, 27, 27, 26, 27, 26, 26), DELTTGRL = c(0, 0, 3.5, 3.5,
3.5, 3.5, 0, 0, 0, 0, 2, 2, 2, 2, 2, 0, 4, 4, 4, 0.5, 0,
0.5, 0, 5, 5, 5, 5, 5, 5, 5, 1, 1, 1, 0, 2, 2, 0, 0, 0, 0,
0, 2, 2, 2, 2, 1.5, 1.5, 0, 2, 2, 0.5, 0.5, 0.5, 0, 0, 0.5,
0.5, 1.5, 1.5, 0, 0, 0, 0, 1, 1, 1, 1, 2), TU17R2 = c(29,
28.5, 28.5, 30, 28, 28, 28, 28, 28, 28, 27, 28, 28, 27, 27,
27, 27.5, 30, 28, 27, 26, 27, 26, 25, 27, 30, 30, 30, 30,
30, 29, 30, 30, 30, 30, 29.5, 30.5, 30.5, 30.5, 30.5, 30.5,
29.5, 28, 28, 28, 28, 26.5, 29, 28, 28, 28, 28, 27, 28, 28,
28, 27.5, 28, 27.5, 27, 27, 26, 26, 26, 26.5, 26, 26, 26),
DELTTGRR = c(0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 1, 1, 1, 1, 1,
0, 2.5, 2.5, 2.5, 0, 0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 1, 1,
1, 0, 0.5, 0.5, 0, 0, 0, 0, 0, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5,
0, 2, 2, 1, 1, 1, 0, 0, 0.5, 0.5, 0.5, 0.5, 0, 1, 1, 1, 0.5,
0.5, 0, 0, 2), TU17L3 = c(8, 8, 8, 10, 8, 8, 8, 8, 7, 7,
7, 10, 10, 9, 9, 9, 9, 10, 7.5, 9, 8.5, 9, 8, 8, 8, 10, 10,
10, 10, 10, 10, 10, 10, 10.5, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 8, 10, 10, 8, 9, 9, 8, 9, 8, 8, 8, 8.5, 8,
8.5, 8, 8, 7, 7, 7, 8, 7, 7, 7), DELTCRGRL = c(0, 0, 2, 2,
2, 2, 0, 1, 0, 1, 3, 3, 3, 3, 3, 0, 2.5, 2.5, 2.5, 0, 0,
0, 0, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 2, 2, 0, 2, 2, 1, 1, 1, 0, 0, 0.5, 0.5, 0.5,
0.5, 0, 1, 1, 1, 1, 1, 0, 0, 2), TU17R3 = c(8, 8, 8, 10,
8, 8, 8, 8, 8, 8, 7.5, 10, 10, 9, 9, 9, 9, 10, 9, 9, 7, 9,
7, 7, 8, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 8, 9, 10, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 7, 7, 7, 8, 7, 7, 7), DELTCRGRR = c(0,
0, 2, 2, 2, 2, 0, 0, 0, 0, 2.5, 2.5, 2.5, 2.5, 2.5, 0, 1,
1, 1, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1)), class = "data.frame", row.names = c(NA,
-68L))
我只需要处理数据。
TU17L4;TU17R4;TU17L5;TU17R5;TU17L2;TU17R2;TU17L3;TU17R3
如果这些变量中至少有 3 个的值同时变化了至少 2.5,则应设置 flag =1.And 如果 flag =1,则 probeg
变量中的下一个计数应从 10000 公里的范围开始。
例如变量 TU17L4
的行 19-20
更改为 79,5-73=6,5,TU17R4
78-72=6 和 TU17L3
10-7,5=2.5
所以在此之后 probeg
变量的行数必须来自 10000
例子
probeg EP_OBJECTID TU17L4 DELTTBL TU17R4 DELTTBR TU17L5 DELTDML
1 10000 88679804 80.0 12.5 80.0 12.5 1060 33
2 20000 88679804 80.5 12.5 79.5 12.5 1054 33
3 30000 88679804 80.5 0.0 79.5 0.0 1054 0
4 40000 88679804 80.5 0.0 79.5 0.0 1054 0
5 50000 88679804 80.5 0.0 79.5 0.0 1054 0
6 60000 88679804 80.5 0.0 79.0 0.5 1054 0
7 70000 88679804 80.5 0.0 78.5 0.5 1054 0
8 80000 88679804 80.5 0.0 78.5 0.0 1054 0
9 90000 88679804 80.0 0.0 78.5 0.0 1053 0
10 100000 88679804 80.0 0.0 78.5 0.5 1053 0
11 110000 88679804 80.0 0.0 78.5 0.5 1053 0
12 120000 88679804 80.0 0.0 78.0 0.5 1053 0
13 130000 88679804 79.5 0.0 78.0 0.0 1052 0
14 140000 88679804 79.5 0.0 78.0 0.0 1052 0
15 150000 88679804 79.5 0.0 78.0 0.0 1052 0
16 160000 88679804 79.5 0.0 78.0 0.0 1052 0
17 170000 88679804 79.5 0.0 78.0 0.0 1052 0
18 180000 88679804 79.5 0.0 78.0 0.0 1052 0
19 190000 88679804 73.0 0.0 72.0 0.0 1038 0
**20 10000 88679804 73.0 0.0 72.0 0.0 1038 0
21 20000 88679804 73.0 11.5 72.0 11.0 1038 21
22 30000 88679804 73.0 1.0 72.0 1.0 1038 2**
TU17R5 DELTDMR TU17L2 DELTTGRL TU17R2 DELTTGRR TU17L3 DELTCRGRL
1 1060 31 27.0 0.0 29.0 0.0 8.0 0.0
2 1054 31 27.0 0.0 28.5 0.0 8.0 0.0
3 1054 0 27.0 3.5 28.5 2.0 8.0 2.0
4 1054 0 30.0 3.5 30.0 2.0 10.0 2.0
5 1054 0 26.5 3.5 28.0 2.0 8.0 2.0
6 1053 1 26.5 3.5 28.0 2.0 8.0 2.0
7 1052 1 26.0 0.0 28.0 0.0 8.0 0.0
8 1052 0 26.0 0.0 28.0 0.0 8.0 1.0
9 1052 0 26.5 0.0 28.0 0.0 7.0 0.0
10 1052 1 26.0 0.0 28.0 0.0 7.0 1.0
11 1052 1 25.0 2.0 27.0 1.0 7.0 3.0
12 1051 1 27.0 2.0 28.0 1.0 10.0 3.0
13 1051 0 27.0 2.0 28.0 1.0 10.0 3.0
14 1051 0 26.5 2.0 27.0 1.0 9.0 3.0
15 1051 0 26.5 2.0 27.0 1.0 9.0 3.0
16 1051 0 26.5 0.0 27.0 0.0 9.0 0.0
17 1051 0 26.0 4.0 27.5 2.5 9.0 2.5
18 1051 0 30.0 4.0 30.0 2.5 10.0 2.5
19 1038 0 27.0 4.0 28.0 2.5 7.5 2.5
20 1038 0 26.0 0.5 27.0 0.0 9.0 0.0
21 1038 21 26.5 0.0 26.0 0.0 8.5 0.0
22 1038 2 26.5 0.5 27.0 0.0 9.0 0.0
TU17R3 DELTCRGRR
1 8.0 0.0
2 8.0 0.0
3 8.0 2.0
4 10.0 2.0
5 8.0 2.0
6 8.0 2.0
7 8.0 0.0
8 8.0 0.0
9 8.0 0.0
10 8.0 0.0
11 7.5 2.5
12 10.0 2.5
13 10.0 2.5
14 9.0 2.5
15 9.0 2.5
16 9.0 0.0
17 9.0 1.0
18 10.0 1.0
19 9.0 1.0
20 9.0 0.0
21 7.0 0.0
22 9.0 0.0
并且这必须分别针对每个 EP_OBJECTID
类别。所以我这样做
第i个运行这部分代码
library(dplyr)
datatrain %>%
filter(!(EP_OBJECTID != lag(EP_OBJECTID) & DELTDMR == lag(DELTDMR))) %>%
group_by(EP_OBJECTID) %>%
mutate(DELT = seq(10000, length.out = n(), by = 10000))
然后我运行第二步,这段代码
threshold <- 3
flags <- dt %>%
apply(., 2, diff) %>%
apply(., 1,
function(x)
ifelse(length(x[abs(x) > threshold]) > 1,
1,
0))
dt$flag <- c(0, flags)
dt
但结果是错误的。 IE。不像我上面提供的那样。
我看到这个结果
X probeg EP_OBJECTID TU17L4 DELTTBL TU17R4 DELTTBR TU17L5
1 1 20000 88679804 80.5 12.5 79.5 12.5 1054
2 2 30000 88679804 80.5 0.0 79.5 0.0 1054
3 3 40000 88679804 80.5 0.0 79.5 0.0 1054
4 4 50000 88679804 80.5 0.0 79.5 0.0 1054
5 5 60000 88679804 80.5 0.0 79.0 0.5 1054
6 6 70000 88679804 80.5 0.0 78.5 0.5 1054
7 7 80000 88679804 80.5 0.0 78.5 0.0 1054
8 8 90000 88679804 80.0 0.0 78.5 0.0 1053
9 9 100000 88679804 80.0 0.0 78.5 0.5 1053
10 10 110000 88679804 80.0 0.0 78.5 0.5 1053
DELTDML TU17R5 DELTDMR TU17L2 DELTTGRL TU17R2 DELTTGRR TU17L3
1 33 1054 31 27.0 0.0 28.5 0 8
2 0 1054 0 27.0 3.5 28.5 2 8
3 0 1054 0 30.0 3.5 30.0 2 10
4 0 1054 0 26.5 3.5 28.0 2 8
5 0 1053 1 26.5 3.5 28.0 2 8
6 0 1052 1 26.0 0.0 28.0 0 8
7 0 1052 0 26.0 0.0 28.0 0 8
8 0 1052 0 26.5 0.0 28.0 0 7
9 0 1052 1 26.0 0.0 28.0 0 7
10 0 1052 1 25.0 2.0 27.0 1 7
DELTCRGRL TU17R3 DELTCRGRR DELT flag
1 0 8.0 0.0 1e+04 0
2 2 8.0 2.0 2e+04 1
3 2 10.0 2.0 3e+04 1
4 2 8.0 2.0 4e+04 1
5 2 8.0 2.0 5e+04 1
6 0 8.0 0.0 6e+04 1
7 1 8.0 0.0 7e+04 1
8 0 8.0 0.0 8e+04 1
9 1 8.0 0.0 9e+04 1
10 3 7.5 2.5 1e+05 1
这是错误的。我怎样才能得到我上面提供的结果。
我认为这种 data.table
方法应该有效..请告诉我..
library( data.table )
setDT( datatrain )
#first, we split by EP_OBJECTID
# loop over the split list,
L <- lapply( split( datatrain, by = "EP_OBJECTID" ), function(dt) {
# find rows where the diff of TU columns is >=2.5 in 3 columns or more
dt[ shift( rowSums( dt[, lapply(.SD, function(x) abs( x - shift(x, type = "lag") ) ),
.SDcols = patterns("^TU") ] >= 2.5 ) >=3, type = "lag" ),
probeg := 10000 ]
#first row of a group is always 10000
dt[ 1, probeg := 10000 ]
#set new value of probeg for all rows
dt[, probeg := seq_len(.N) * 10000, by = .(cumsum( probeg == 10000 ) ) ]
return(dt)
})
#bind the split list back together to a single data.table
ans <- rbindlist( L, use.names = TRUE )
我使用的这些数据。
datatrain=structure(list(probeg = c(10000L, 20000L, 30000L, 40000L, 50000L,
60000L, 70000L, 80000L, 90000L, 100000L, 110000L, 120000L, 130000L,
140000L, 150000L, 160000L, 170000L, 180000L, 190000L, 200000L,
210000L, 220000L, 230000L, 240000L, 250000L, 260000L, 270000L,
280000L, 290000L, 300000L, 310000L, 320000L, 330000L, 340000L,
350000L, 360000L, 370000L, 380000L, 390000L, 400000L, 410000L,
420000L, 430000L, 440000L, 450000L, 460000L, 470000L, 480000L,
490000L, 500000L, 510000L, 520000L, 530000L, 540000L, 550000L,
560000L, 570000L, 580000L, 590000L, 600000L, 610000L, 620000L,
630000L, 640000L, 650000L, 660000L, 670000L, 680000L), EP_OBJECTID = c(88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 88679804L,
88679804L, 88679804L, 88679804L, 88679804L, 88679804L, 9000L,
9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L,
9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L, 9000L,
9000L), TU17L4 = c(80, 80.5, 80.5, 80.5, 80.5, 80.5, 80.5, 80.5,
80, 80, 80, 80, 79.5, 79.5, 79.5, 79.5, 79.5, 79.5, 73, 73, 73,
73, 73, 72, 72, 72, 70.5, 70.5, 70.5, 70.5, 70.5, 70, 70.5, 67,
67, 67, 67, 61.5, 61.5, 61.5, 61.5, 61.5, 61.5, 61.5, 61.5, 61.5,
61.5, 57.5, 56, 57.5, 57.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5,
56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5, 56.5
), DELTTBL = c(12.5, 12.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 11.5, 1, 1, 1, 0, 1.5, 1.5, 0, 0, 0, 0, 0.5,
0.5, 0, 0, 0, 0, 0, 0, 0, 11.5, 0, 0, 0, 0, 4, 4, 4, 1.5, 4,
1.5, 1.5, 1.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
TU17R4 = c(80, 79.5, 79.5, 79.5, 79.5, 79, 78.5, 78.5, 78.5,
78.5, 78.5, 78, 78, 78, 78, 78, 78, 78, 72, 72, 72, 72, 72,
71, 71, 71, 69.5, 69.5, 69.5, 69.5, 69.5, 69.5, 69.5, 66,
66, 66, 66, 61, 61, 61, 61, 60.5, 60.5, 60.5, 60.5, 60.5,
60.5, 57, 56, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57,
57, 57, 57, 56.5, 56.5, 56.5, 56.5, 56.5), DELTTBR = c(12.5,
12.5, 0, 0, 0, 0.5, 0.5, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0, 0,
0, 0, 0, 0, 11, 1, 1, 1, 0, 1.5, 1.5, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 11, 0.5, 0.5, 0.5, 0, 3.5, 3.5, 3.5, 1,
3.5, 1, 1, 1, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5), TU17L5 = c(1060L, 1054L, 1054L, 1054L,
1054L, 1054L, 1054L, 1054L, 1053L, 1053L, 1053L, 1053L, 1052L,
1052L, 1052L, 1052L, 1052L, 1052L, 1038L, 1038L, 1038L, 1038L,
1038L, 1036L, 1036L, 1036L, 1033L, 1033L, 1033L, 1041L, 1033L,
1032L, 1033L, 1026L, 1026L, 1026L, 1026L, 1017L, 1017L, 1017L,
1017L, 1017L, 1017L, 1017L, 1017L, 1017L, 1017L, 1009L, 1028L,
1009L, 1009L, 1007L, 1007L, 1007L, 1014L, 1007L, 1007L, 1007L,
1014L, 1007L, 1007L, 1007L, 1007L, 1007L, 1007L, 1007L, 1007L,
1007L), DELTDML = c(33L, 33L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 21L, 2L, 2L,
2L, 0L, 3L, 3L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 21L, 0L, 0L, 0L, 0L, 8L, 8L, 8L, 21L, 8L, 21L, 21L,
21L, 0L, 7L, 7L, 7L, 7L, 7L, 0L, 0L, 0L, 0L, 0L, 0L, 7L,
7L, 7L), TU17R5 = c(1060L, 1054L, 1054L, 1054L, 1054L, 1053L,
1052L, 1052L, 1052L, 1052L, 1052L, 1051L, 1051L, 1051L, 1051L,
1051L, 1051L, 1051L, 1038L, 1038L, 1038L, 1038L, 1038L, 1036L,
1036L, 1036L, 1033L, 1033L, 1033L, 1040L, 1033L, 1033L, 1033L,
1026L, 1026L, 1026L, 1026L, 1017L, 1017L, 1017L, 1017L, 1016L,
1016L, 1016L, 1016L, 1016L, 1016L, 1009L, 1028L, 1009L, 1009L,
1009L, 1009L, 1009L, 1014L, 1009L, 1009L, 1009L, 1014L, 1009L,
1009L, 1009L, 1009L, 1008L, 1008L, 1008L, 1008L, 1008L),
DELTDMR = c(31L, 31L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 21L, 2L, 2L, 2L, 0L,
3L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
21L, 1L, 1L, 1L, 0L, 7L, 7L, 7L, 19L, 7L, 19L, 19L, 19L,
0L, 5L, 5L, 5L, 5L, 6L, 1L, 1L, 1L, 1L, 1L, 1L, 6L, 6L, 6L
), TU17L2 = c(27, 27, 27, 30, 26.5, 26.5, 26, 26, 26.5, 26,
25, 27, 27, 26.5, 26.5, 26.5, 26, 30, 27, 26, 26.5, 26.5,
26.5, 26.5, 26, 31, 31, 30, 30, 30, 29, 30, 30, 29, 30, 28,
28.5, 28.5, 28.5, 28.5, 28.5, 28, 30, 30, 30, 30, 28.5, 29,
28, 27, 27, 27, 26.5, 28, 28, 28, 28.5, 27, 28.5, 27, 27,
27, 27, 27, 26, 27, 26, 26), DELTTGRL = c(0, 0, 3.5, 3.5,
3.5, 3.5, 0, 0, 0, 0, 2, 2, 2, 2, 2, 0, 4, 4, 4, 0.5, 0,
0.5, 0, 5, 5, 5, 5, 5, 5, 5, 1, 1, 1, 0, 2, 2, 0, 0, 0, 0,
0, 2, 2, 2, 2, 1.5, 1.5, 0, 2, 2, 0.5, 0.5, 0.5, 0, 0, 0.5,
0.5, 1.5, 1.5, 0, 0, 0, 0, 1, 1, 1, 1, 2), TU17R2 = c(29,
28.5, 28.5, 30, 28, 28, 28, 28, 28, 28, 27, 28, 28, 27, 27,
27, 27.5, 30, 28, 27, 26, 27, 26, 25, 27, 30, 30, 30, 30,
30, 29, 30, 30, 30, 30, 29.5, 30.5, 30.5, 30.5, 30.5, 30.5,
29.5, 28, 28, 28, 28, 26.5, 29, 28, 28, 28, 28, 27, 28, 28,
28, 27.5, 28, 27.5, 27, 27, 26, 26, 26, 26.5, 26, 26, 26),
DELTTGRR = c(0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 1, 1, 1, 1, 1,
0, 2.5, 2.5, 2.5, 0, 0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 1, 1,
1, 0, 0.5, 0.5, 0, 0, 0, 0, 0, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5,
0, 2, 2, 1, 1, 1, 0, 0, 0.5, 0.5, 0.5, 0.5, 0, 1, 1, 1, 0.5,
0.5, 0, 0, 2), TU17L3 = c(8, 8, 8, 10, 8, 8, 8, 8, 7, 7,
7, 10, 10, 9, 9, 9, 9, 10, 7.5, 9, 8.5, 9, 8, 8, 8, 10, 10,
10, 10, 10, 10, 10, 10, 10.5, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 8, 10, 10, 8, 9, 9, 8, 9, 8, 8, 8, 8.5, 8,
8.5, 8, 8, 7, 7, 7, 8, 7, 7, 7), DELTCRGRL = c(0, 0, 2, 2,
2, 2, 0, 1, 0, 1, 3, 3, 3, 3, 3, 0, 2.5, 2.5, 2.5, 0, 0,
0, 0, 2, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 2, 2, 0, 2, 2, 1, 1, 1, 0, 0, 0.5, 0.5, 0.5,
0.5, 0, 1, 1, 1, 1, 1, 0, 0, 2), TU17R3 = c(8, 8, 8, 10,
8, 8, 8, 8, 8, 8, 7.5, 10, 10, 9, 9, 9, 9, 10, 9, 9, 7, 9,
7, 7, 8, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 8, 9, 10, 8, 8, 8, 8, 8, 8, 8,
8, 8, 8, 8, 8, 8, 7, 7, 7, 8, 7, 7, 7), DELTCRGRR = c(0,
0, 2, 2, 2, 2, 0, 0, 0, 0, 2.5, 2.5, 2.5, 2.5, 2.5, 0, 1,
1, 1, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1)), class = "data.frame", row.names = c(NA,
-68L))
我只需要处理数据。
TU17L4;TU17R4;TU17L5;TU17R5;TU17L2;TU17R2;TU17L3;TU17R3
如果这些变量中至少有 3 个的值同时变化了至少 2.5,则应设置 flag =1.And 如果 flag =1,则 probeg
变量中的下一个计数应从 10000 公里的范围开始。
例如变量 TU17L4
的行 19-20
更改为 79,5-73=6,5,TU17R4
78-72=6 和 TU17L3
10-7,5=2.5
所以在此之后 probeg
变量的行数必须来自 10000
例子
probeg EP_OBJECTID TU17L4 DELTTBL TU17R4 DELTTBR TU17L5 DELTDML
1 10000 88679804 80.0 12.5 80.0 12.5 1060 33
2 20000 88679804 80.5 12.5 79.5 12.5 1054 33
3 30000 88679804 80.5 0.0 79.5 0.0 1054 0
4 40000 88679804 80.5 0.0 79.5 0.0 1054 0
5 50000 88679804 80.5 0.0 79.5 0.0 1054 0
6 60000 88679804 80.5 0.0 79.0 0.5 1054 0
7 70000 88679804 80.5 0.0 78.5 0.5 1054 0
8 80000 88679804 80.5 0.0 78.5 0.0 1054 0
9 90000 88679804 80.0 0.0 78.5 0.0 1053 0
10 100000 88679804 80.0 0.0 78.5 0.5 1053 0
11 110000 88679804 80.0 0.0 78.5 0.5 1053 0
12 120000 88679804 80.0 0.0 78.0 0.5 1053 0
13 130000 88679804 79.5 0.0 78.0 0.0 1052 0
14 140000 88679804 79.5 0.0 78.0 0.0 1052 0
15 150000 88679804 79.5 0.0 78.0 0.0 1052 0
16 160000 88679804 79.5 0.0 78.0 0.0 1052 0
17 170000 88679804 79.5 0.0 78.0 0.0 1052 0
18 180000 88679804 79.5 0.0 78.0 0.0 1052 0
19 190000 88679804 73.0 0.0 72.0 0.0 1038 0
**20 10000 88679804 73.0 0.0 72.0 0.0 1038 0
21 20000 88679804 73.0 11.5 72.0 11.0 1038 21
22 30000 88679804 73.0 1.0 72.0 1.0 1038 2**
TU17R5 DELTDMR TU17L2 DELTTGRL TU17R2 DELTTGRR TU17L3 DELTCRGRL
1 1060 31 27.0 0.0 29.0 0.0 8.0 0.0
2 1054 31 27.0 0.0 28.5 0.0 8.0 0.0
3 1054 0 27.0 3.5 28.5 2.0 8.0 2.0
4 1054 0 30.0 3.5 30.0 2.0 10.0 2.0
5 1054 0 26.5 3.5 28.0 2.0 8.0 2.0
6 1053 1 26.5 3.5 28.0 2.0 8.0 2.0
7 1052 1 26.0 0.0 28.0 0.0 8.0 0.0
8 1052 0 26.0 0.0 28.0 0.0 8.0 1.0
9 1052 0 26.5 0.0 28.0 0.0 7.0 0.0
10 1052 1 26.0 0.0 28.0 0.0 7.0 1.0
11 1052 1 25.0 2.0 27.0 1.0 7.0 3.0
12 1051 1 27.0 2.0 28.0 1.0 10.0 3.0
13 1051 0 27.0 2.0 28.0 1.0 10.0 3.0
14 1051 0 26.5 2.0 27.0 1.0 9.0 3.0
15 1051 0 26.5 2.0 27.0 1.0 9.0 3.0
16 1051 0 26.5 0.0 27.0 0.0 9.0 0.0
17 1051 0 26.0 4.0 27.5 2.5 9.0 2.5
18 1051 0 30.0 4.0 30.0 2.5 10.0 2.5
19 1038 0 27.0 4.0 28.0 2.5 7.5 2.5
20 1038 0 26.0 0.5 27.0 0.0 9.0 0.0
21 1038 21 26.5 0.0 26.0 0.0 8.5 0.0
22 1038 2 26.5 0.5 27.0 0.0 9.0 0.0
TU17R3 DELTCRGRR
1 8.0 0.0
2 8.0 0.0
3 8.0 2.0
4 10.0 2.0
5 8.0 2.0
6 8.0 2.0
7 8.0 0.0
8 8.0 0.0
9 8.0 0.0
10 8.0 0.0
11 7.5 2.5
12 10.0 2.5
13 10.0 2.5
14 9.0 2.5
15 9.0 2.5
16 9.0 0.0
17 9.0 1.0
18 10.0 1.0
19 9.0 1.0
20 9.0 0.0
21 7.0 0.0
22 9.0 0.0
并且这必须分别针对每个 EP_OBJECTID
类别。所以我这样做
第i个运行这部分代码
library(dplyr)
datatrain %>%
filter(!(EP_OBJECTID != lag(EP_OBJECTID) & DELTDMR == lag(DELTDMR))) %>%
group_by(EP_OBJECTID) %>%
mutate(DELT = seq(10000, length.out = n(), by = 10000))
然后我运行第二步,这段代码
threshold <- 3
flags <- dt %>%
apply(., 2, diff) %>%
apply(., 1,
function(x)
ifelse(length(x[abs(x) > threshold]) > 1,
1,
0))
dt$flag <- c(0, flags)
dt
但结果是错误的。 IE。不像我上面提供的那样。 我看到这个结果
X probeg EP_OBJECTID TU17L4 DELTTBL TU17R4 DELTTBR TU17L5
1 1 20000 88679804 80.5 12.5 79.5 12.5 1054
2 2 30000 88679804 80.5 0.0 79.5 0.0 1054
3 3 40000 88679804 80.5 0.0 79.5 0.0 1054
4 4 50000 88679804 80.5 0.0 79.5 0.0 1054
5 5 60000 88679804 80.5 0.0 79.0 0.5 1054
6 6 70000 88679804 80.5 0.0 78.5 0.5 1054
7 7 80000 88679804 80.5 0.0 78.5 0.0 1054
8 8 90000 88679804 80.0 0.0 78.5 0.0 1053
9 9 100000 88679804 80.0 0.0 78.5 0.5 1053
10 10 110000 88679804 80.0 0.0 78.5 0.5 1053
DELTDML TU17R5 DELTDMR TU17L2 DELTTGRL TU17R2 DELTTGRR TU17L3
1 33 1054 31 27.0 0.0 28.5 0 8
2 0 1054 0 27.0 3.5 28.5 2 8
3 0 1054 0 30.0 3.5 30.0 2 10
4 0 1054 0 26.5 3.5 28.0 2 8
5 0 1053 1 26.5 3.5 28.0 2 8
6 0 1052 1 26.0 0.0 28.0 0 8
7 0 1052 0 26.0 0.0 28.0 0 8
8 0 1052 0 26.5 0.0 28.0 0 7
9 0 1052 1 26.0 0.0 28.0 0 7
10 0 1052 1 25.0 2.0 27.0 1 7
DELTCRGRL TU17R3 DELTCRGRR DELT flag
1 0 8.0 0.0 1e+04 0
2 2 8.0 2.0 2e+04 1
3 2 10.0 2.0 3e+04 1
4 2 8.0 2.0 4e+04 1
5 2 8.0 2.0 5e+04 1
6 0 8.0 0.0 6e+04 1
7 1 8.0 0.0 7e+04 1
8 0 8.0 0.0 8e+04 1
9 1 8.0 0.0 9e+04 1
10 3 7.5 2.5 1e+05 1
这是错误的。我怎样才能得到我上面提供的结果。
我认为这种 data.table
方法应该有效..请告诉我..
library( data.table )
setDT( datatrain )
#first, we split by EP_OBJECTID
# loop over the split list,
L <- lapply( split( datatrain, by = "EP_OBJECTID" ), function(dt) {
# find rows where the diff of TU columns is >=2.5 in 3 columns or more
dt[ shift( rowSums( dt[, lapply(.SD, function(x) abs( x - shift(x, type = "lag") ) ),
.SDcols = patterns("^TU") ] >= 2.5 ) >=3, type = "lag" ),
probeg := 10000 ]
#first row of a group is always 10000
dt[ 1, probeg := 10000 ]
#set new value of probeg for all rows
dt[, probeg := seq_len(.N) * 10000, by = .(cumsum( probeg == 10000 ) ) ]
return(dt)
})
#bind the split list back together to a single data.table
ans <- rbindlist( L, use.names = TRUE )