使用 data.table 更新按行相互依赖的两列
Update two columns that are interdependent row-wise using data.table
我想创建一个 data.table,其中包含公交车站之间的出发和到达时间。这是我的 data.table
的格式。 (下面的可重现数据集)
trip_id stop_sequence arrival_time departure_time travel_time
1: a 1 07:00:00 07:00:00 00:00:00
2: a 2 00:00:00 00:00:00 00:02:41
3: a 3 00:00:00 00:00:00 00:01:36
4: a 4 00:00:00 00:00:00 00:02:39
5: a 5 00:00:00 00:00:00 00:02:28
6: b 1 07:00:00 07:00:00 00:00:00
7: b 2 00:00:00 00:00:00 00:00:00
8: b 3 00:00:00 00:00:00 00:01:36
9: b 4 00:00:00 00:00:00 00:00:37
10: b 5 00:00:00 00:00:00 00:03:00
这是它应该如何工作的。这个想法是车辆按照停止顺序行驶。例如在行程a
中,车辆从1
站行驶到2
站需要00:02:41
。给定乘客在每个站点 enter/leave 车辆的固定时间 40 秒,公共汽车将从 2
"07:03:21"
站点出发
这里的问题是这是两列之间的逐行迭代过程。直觉上,我会 但我无法理解这一点。帮忙?
可重现的数据集:
library(data.table)
library(chron)
dt <- structure(list(trip_id = c("a", "a", "a", "a", "a", "b", "b",
"b", "b", "b"), stop_sequence = c(1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L), arrival_time = structure(c(0.291666666666667, 0,
0, 0, 0, 0.291666666666667, 0, 0, 0, 0), format = "h:m:s", class = "times"),
departure_time = structure(c(0.291666666666667, 0, 0, 0,
0, 0.291666666666667, 0, 0, 0, 0), format = "h:m:s", class = "times"),
travel_time = structure(c(0, 0.00186598685444013, 0.00110857958406301,
0.00183749407361369, 0.00171664297781446, 0, 0.000522388450578203,
0.00111473367541453, 0.000427755975518318, 0.00207918951573377
), format = "h:m:s", class = "times")), .Names = c("trip_id",
"stop_sequence", "arrival_time", "departure_time", "travel_time"
), class = c("data.table", "data.frame"), row.names = c(NA, -10L
))
预期输出:前四行
trip_id stop_sequence arrival_time departure_time travel_time
1: a 1 07:00:00 07:00:00 00:00:00
2: a 2 07:02:41 07:03:21 00:02:41
3: a 3 07:04:57 07:05:37 00:01:36
4: a 4 07:08:16 07:08:56 00:02:39
我认为不循环也可以做到。我想你可以不用循环计算 departure_time
然后一旦你有了它,arrival_time
就是 departure_time - 40 seconds
:
dt2 <- copy(dt)
dt2[,c("arrival_time", "departure_time") := .(cumsum(arrival_time + ifelse(travel_time==0, 0, travel_time + times("00:00:40"))) - ifelse(travel_time == 0 , 0, times("00:00:40")),
cumsum(arrival_time + ifelse(travel_time==0, 0, travel_time + times("00:00:40")))),
by = trip_id]
dt2
# trip_id stop_sequence arrival_time departure_time travel_time
#1: a 1 07:00:00 07:00:00 00:00:00
#2: a 2 07:02:41 07:03:21 00:02:41
#3: a 3 07:04:57 07:05:37 00:01:36
#4: a 4 07:08:16 07:08:56 00:02:39
#5: a 5 07:11:24 07:12:04 00:02:28
#6: b 1 07:00:00 07:00:00 00:00:00
#7: b 2 07:00:45 07:01:25 00:00:45
#8: b 3 07:03:01 07:03:41 00:01:36
#9: b 4 07:04:18 07:04:58 00:00:37
#10: b 5 07:07:58 07:08:38 00:03:00
或者,您不必为 departure_time
重复长 cumsum
以获得 arrival_time
,您可以这样做:
dt2[,departure_time := cumsum(arrival_time + ifelse(travel_time==0, 0, travel_time + times("00:00:40"))), by = trip_id]
dt2[, arrival_time := departure_time - ifelse(travel_time == 0 , 0, times("00:00:40"))]
@eddi 发布的第三个选项:
dt[, departure_time := arrival_time[1] + cumsum(travel_time) + (0:(.N-1))*times('00:00:40'), by = trip_id]
dt[, arrival_time := c(arrival_time[1], tail(departure_time, -1) - times('00:00:40')), by = trip_id]
我想创建一个 data.table,其中包含公交车站之间的出发和到达时间。这是我的 data.table
的格式。 (下面的可重现数据集)
trip_id stop_sequence arrival_time departure_time travel_time
1: a 1 07:00:00 07:00:00 00:00:00
2: a 2 00:00:00 00:00:00 00:02:41
3: a 3 00:00:00 00:00:00 00:01:36
4: a 4 00:00:00 00:00:00 00:02:39
5: a 5 00:00:00 00:00:00 00:02:28
6: b 1 07:00:00 07:00:00 00:00:00
7: b 2 00:00:00 00:00:00 00:00:00
8: b 3 00:00:00 00:00:00 00:01:36
9: b 4 00:00:00 00:00:00 00:00:37
10: b 5 00:00:00 00:00:00 00:03:00
这是它应该如何工作的。这个想法是车辆按照停止顺序行驶。例如在行程a
中,车辆从1
站行驶到2
站需要00:02:41
。给定乘客在每个站点 enter/leave 车辆的固定时间 40 秒,公共汽车将从 2
"07:03:21"
这里的问题是这是两列之间的逐行迭代过程。直觉上,我会
可重现的数据集:
library(data.table)
library(chron)
dt <- structure(list(trip_id = c("a", "a", "a", "a", "a", "b", "b",
"b", "b", "b"), stop_sequence = c(1L, 2L, 3L, 4L, 5L, 1L, 2L,
3L, 4L, 5L), arrival_time = structure(c(0.291666666666667, 0,
0, 0, 0, 0.291666666666667, 0, 0, 0, 0), format = "h:m:s", class = "times"),
departure_time = structure(c(0.291666666666667, 0, 0, 0,
0, 0.291666666666667, 0, 0, 0, 0), format = "h:m:s", class = "times"),
travel_time = structure(c(0, 0.00186598685444013, 0.00110857958406301,
0.00183749407361369, 0.00171664297781446, 0, 0.000522388450578203,
0.00111473367541453, 0.000427755975518318, 0.00207918951573377
), format = "h:m:s", class = "times")), .Names = c("trip_id",
"stop_sequence", "arrival_time", "departure_time", "travel_time"
), class = c("data.table", "data.frame"), row.names = c(NA, -10L
))
预期输出:前四行
trip_id stop_sequence arrival_time departure_time travel_time
1: a 1 07:00:00 07:00:00 00:00:00
2: a 2 07:02:41 07:03:21 00:02:41
3: a 3 07:04:57 07:05:37 00:01:36
4: a 4 07:08:16 07:08:56 00:02:39
我认为不循环也可以做到。我想你可以不用循环计算 departure_time
然后一旦你有了它,arrival_time
就是 departure_time - 40 seconds
:
dt2 <- copy(dt)
dt2[,c("arrival_time", "departure_time") := .(cumsum(arrival_time + ifelse(travel_time==0, 0, travel_time + times("00:00:40"))) - ifelse(travel_time == 0 , 0, times("00:00:40")),
cumsum(arrival_time + ifelse(travel_time==0, 0, travel_time + times("00:00:40")))),
by = trip_id]
dt2
# trip_id stop_sequence arrival_time departure_time travel_time
#1: a 1 07:00:00 07:00:00 00:00:00
#2: a 2 07:02:41 07:03:21 00:02:41
#3: a 3 07:04:57 07:05:37 00:01:36
#4: a 4 07:08:16 07:08:56 00:02:39
#5: a 5 07:11:24 07:12:04 00:02:28
#6: b 1 07:00:00 07:00:00 00:00:00
#7: b 2 07:00:45 07:01:25 00:00:45
#8: b 3 07:03:01 07:03:41 00:01:36
#9: b 4 07:04:18 07:04:58 00:00:37
#10: b 5 07:07:58 07:08:38 00:03:00
或者,您不必为 departure_time
重复长 cumsum
以获得 arrival_time
,您可以这样做:
dt2[,departure_time := cumsum(arrival_time + ifelse(travel_time==0, 0, travel_time + times("00:00:40"))), by = trip_id]
dt2[, arrival_time := departure_time - ifelse(travel_time == 0 , 0, times("00:00:40"))]
@eddi 发布的第三个选项:
dt[, departure_time := arrival_time[1] + cumsum(travel_time) + (0:(.N-1))*times('00:00:40'), by = trip_id]
dt[, arrival_time := c(arrival_time[1], tail(departure_time, -1) - times('00:00:40')), by = trip_id]