如何计算滞后的两列的时间差
How to calculate time difference of two columns with a lag
我目前正面临纽约 driver 的出租车行程数据集。我得到了 driver ID 以及每次旅行的上车日期和时间以及下车日期和时间。现在我想计算上次行程的下车时间和新行程的上车时间之间的等待时间。因此,我必须计算具有一个滞后的两列之间的时间差(因为下车时间是指上次旅行和下一次旅行(下一列)的接送时间)按 driver ID 分组(以确保我不是计算两次不同drivers)的行程之间的时间差)。
可能的数据集如下所示:
hack_license = c("303F79923DA5DA7A10DF15E2D91CDCF7","697ABFCDF7E7C77A01183C857132F2A4","697ABFCDF7E7C77A01183C857132F2A4","697ABFCDF7E7C77A01183C857132F2A4","ABE23CA71E2DE84972281BA1C70B6EBB","ABE23CA71E2DE84972281BA1C70B6EBB","BA83D7C383EAA4F9D78A1A8B83CB3E92","BA83D7C383EAA4F9D78A1A8B83CB3E92","D476A1872F1F6594BD638C274483ED06","D476A1872F1F6594BD638C274483ED06")
pickup_datetime = c("2013-12-31 23:01:07","2013-12-31 23:04:00","2013-12-31 23:31:00","2013-12-31 23:40:00","2013-12-31 23:16:39","2013-12-31 23:24:05","2013-12-31 23:09:10","2013-12-31 23:26:26","2013-12-31 23:13:00","2013-12-31 23:22:00")
dropoff_datetime = c("2013-12-31 23:20:33","2013-12-31 23:28:00","2013-12-31 23:33:00","2013-12-31 23:48:00","2013-12-31 23:22:29","2013-12-31 23:28:37","23:21:24","2013-12-31 23:36:54","2013-12-31 23:20:00","2013-12-31 23:27:00")
data <- data.frame(hack_license,pickup_datetime,dropoff_datetime)
我试过像这样使用 dplyr 和 lubridate,但它不起作用。
data %>%
group_by(data$hack_license) %>%
group_by(hack_license) %>%
mutate(waiting_time_in_secs = difftime(pickup_datetime,
lag(dropoff_datetime), units = 'secs'))
也许你们中的一些人可以帮助我。太棒了!
您可以为上车和下车创建一个日期时间列,并为每个 hack_license
计算当前上车时间和上一个下车时间之间的时间差。
library(dplyr)
library(lubridate)
data <- data %>%
mutate(pickup_datetime = ymd_hms(pickup_datetime),
dropoff_datetime = ymd_hms(dropoff_datetime)) %>%
group_by(hack_license) %>%
mutate(waiting_time_in_secs = as.numeric(difftime(pickup_datetime,
lag(dropoff_datetime), units = 'secs')))
data
# hack_license pickup_datetime dropoff_datetime waiting_time_in_secs
# <chr> <dttm> <dttm> <dbl>
# 1 303F79923DA5DA7A10DF15E2D91CDCF7 2013-12-31 23:01:07 2013-12-31 23:20:33 NA
# 2 697ABFCDF7E7C77A01183C857132F2A4 2013-12-31 23:04:00 2013-12-31 23:28:00 NA
# 3 697ABFCDF7E7C77A01183C857132F2A4 2013-12-31 23:31:00 2013-12-31 23:33:00 180
# 4 697ABFCDF7E7C77A01183C857132F2A4 2013-12-31 23:40:00 2013-12-31 23:48:00 420
# 5 ABE23CA71E2DE84972281BA1C70B6EBB 2013-12-31 23:16:39 2013-12-31 23:22:29 NA
# 6 ABE23CA71E2DE84972281BA1C70B6EBB 2013-12-31 23:24:05 2013-12-31 23:28:37 96
# 7 BA83D7C383EAA4F9D78A1A8B83CB3E92 2013-12-31 23:09:10 2013-12-31 23:21:24 NA
# 8 BA83D7C383EAA4F9D78A1A8B83CB3E92 2013-12-31 23:26:26 2013-12-31 23:36:54 302
# 9 D476A1872F1F6594BD638C274483ED06 2013-12-31 23:13:00 2013-12-31 23:20:00 NA
#10 D476A1872F1F6594BD638C274483ED06 2013-12-31 23:22:00 2013-12-31 23:27:00 120
我目前正面临纽约 driver 的出租车行程数据集。我得到了 driver ID 以及每次旅行的上车日期和时间以及下车日期和时间。现在我想计算上次行程的下车时间和新行程的上车时间之间的等待时间。因此,我必须计算具有一个滞后的两列之间的时间差(因为下车时间是指上次旅行和下一次旅行(下一列)的接送时间)按 driver ID 分组(以确保我不是计算两次不同drivers)的行程之间的时间差)。
可能的数据集如下所示:
hack_license = c("303F79923DA5DA7A10DF15E2D91CDCF7","697ABFCDF7E7C77A01183C857132F2A4","697ABFCDF7E7C77A01183C857132F2A4","697ABFCDF7E7C77A01183C857132F2A4","ABE23CA71E2DE84972281BA1C70B6EBB","ABE23CA71E2DE84972281BA1C70B6EBB","BA83D7C383EAA4F9D78A1A8B83CB3E92","BA83D7C383EAA4F9D78A1A8B83CB3E92","D476A1872F1F6594BD638C274483ED06","D476A1872F1F6594BD638C274483ED06")
pickup_datetime = c("2013-12-31 23:01:07","2013-12-31 23:04:00","2013-12-31 23:31:00","2013-12-31 23:40:00","2013-12-31 23:16:39","2013-12-31 23:24:05","2013-12-31 23:09:10","2013-12-31 23:26:26","2013-12-31 23:13:00","2013-12-31 23:22:00")
dropoff_datetime = c("2013-12-31 23:20:33","2013-12-31 23:28:00","2013-12-31 23:33:00","2013-12-31 23:48:00","2013-12-31 23:22:29","2013-12-31 23:28:37","23:21:24","2013-12-31 23:36:54","2013-12-31 23:20:00","2013-12-31 23:27:00")
data <- data.frame(hack_license,pickup_datetime,dropoff_datetime)
我试过像这样使用 dplyr 和 lubridate,但它不起作用。
data %>%
group_by(data$hack_license) %>%
group_by(hack_license) %>%
mutate(waiting_time_in_secs = difftime(pickup_datetime,
lag(dropoff_datetime), units = 'secs'))
也许你们中的一些人可以帮助我。太棒了!
您可以为上车和下车创建一个日期时间列,并为每个 hack_license
计算当前上车时间和上一个下车时间之间的时间差。
library(dplyr)
library(lubridate)
data <- data %>%
mutate(pickup_datetime = ymd_hms(pickup_datetime),
dropoff_datetime = ymd_hms(dropoff_datetime)) %>%
group_by(hack_license) %>%
mutate(waiting_time_in_secs = as.numeric(difftime(pickup_datetime,
lag(dropoff_datetime), units = 'secs')))
data
# hack_license pickup_datetime dropoff_datetime waiting_time_in_secs
# <chr> <dttm> <dttm> <dbl>
# 1 303F79923DA5DA7A10DF15E2D91CDCF7 2013-12-31 23:01:07 2013-12-31 23:20:33 NA
# 2 697ABFCDF7E7C77A01183C857132F2A4 2013-12-31 23:04:00 2013-12-31 23:28:00 NA
# 3 697ABFCDF7E7C77A01183C857132F2A4 2013-12-31 23:31:00 2013-12-31 23:33:00 180
# 4 697ABFCDF7E7C77A01183C857132F2A4 2013-12-31 23:40:00 2013-12-31 23:48:00 420
# 5 ABE23CA71E2DE84972281BA1C70B6EBB 2013-12-31 23:16:39 2013-12-31 23:22:29 NA
# 6 ABE23CA71E2DE84972281BA1C70B6EBB 2013-12-31 23:24:05 2013-12-31 23:28:37 96
# 7 BA83D7C383EAA4F9D78A1A8B83CB3E92 2013-12-31 23:09:10 2013-12-31 23:21:24 NA
# 8 BA83D7C383EAA4F9D78A1A8B83CB3E92 2013-12-31 23:26:26 2013-12-31 23:36:54 302
# 9 D476A1872F1F6594BD638C274483ED06 2013-12-31 23:13:00 2013-12-31 23:20:00 NA
#10 D476A1872F1F6594BD638C274483ED06 2013-12-31 23:22:00 2013-12-31 23:27:00 120