如何按组计算日期之间的时间差
how to calculate time difference between dates by group
我有一个包含 date.times 和位置的数据框。我想计算组内一条记录与上一条记录(按日期排列)之间的分钟差,并更改为新列。
我已经弄清楚如何使用循环来完成此操作,但这仅对所有组(位置)一起执行,我不确定我将如何按组执行此操作。
# fake data set for example:
df <- data.frame(
location = c(
1,1,3,4,4,5,6,5,4,4,3,2,2,1,1,2,3,4,4,2
),
date.time = c(
"2017-10-22 04:49:23", "2017-10-23 01:02:06",
"2017-10-23 01:09:17", "2017-10-23 18:32:46",
"2017-10-24 18:50:19", "2017-11-01 03:07:24",
"2017-11-01 19:05:58", "2017-11-02 01:56:48",
"2017-11-02 01:58:16", "2017-11-02 02:00:38",
"2017-11-06 19:53:56", "2017-11-09 13:08:39",
"2017-09-18 01:25:27", "2017-09-19 05:19:43",
"2017-09-21 21:42:33", "2017-09-22 00:49:16",
"2017-09-22 03:48:05", "2017-09-22 20:56:57",
"2017-09-23 19:09:48", "2017-09-24 05:52:35"
),
time.diff.mins = NA
) %>%
arrange(date.time) %>%
mutate(
date.time = as.POSIXct(
date.time,
format = "%Y-%m-%d %H:%M:%S"
)
)
这给出:
location date.time time.diff.mins
1 2 2017-09-18 01:25:27 NA
2 1 2017-09-19 05:19:43 NA
3 1 2017-09-21 21:42:33 NA
4 2 2017-09-22 00:49:16 NA
5 3 2017-09-22 03:48:05 NA
...
...
因此,例如,我希望在第 4 行的 time.diff.mins 列中打印第 4 行和第 1 行之间的分钟数差异。并且 time.diff.mins 列,第 3 行之间会有时间差异第3行打印第3行和第2行。然后根据位置组迭代地继续计算前一条记录的时间差异。
这个循环适用于整个数据集,但我不知道如何将它与 dplyr::group_by 或其他一些方法集成..
for (i in 2:nrow(df)) {
df[i,3] <-
difftime(time1 = as.POSIXct(
df[i, 2],
format = "%Y:%m:%d %H:%M:%S"
),
time2 = as.POSIXct(
df[i-1, 2],
format = "%Y:%m:%d %H:%M:%S"
),
units = "mins"
)
}
这会生成例如:
location date.time time.diff.mins
1 2 2017-09-18 01:25:27 NA
2 1 2017-09-19 05:19:43 1674.266667
3 1 2017-09-21 21:42:33 3862.833333
4 2 2017-09-22 00:49:16 186.716667
5 3 2017-09-22 03:48:05 178.816667
...
...
任何建议或指导将不胜感激!
如果我们需要按'location'
分组
library(dplyr)
df %>%
group_by(location) %>%
mutate(time.diff.mins = difftime(date.time, lag(date.time), unit = 'min'))
我有一个包含 date.times 和位置的数据框。我想计算组内一条记录与上一条记录(按日期排列)之间的分钟差,并更改为新列。
我已经弄清楚如何使用循环来完成此操作,但这仅对所有组(位置)一起执行,我不确定我将如何按组执行此操作。
# fake data set for example:
df <- data.frame(
location = c(
1,1,3,4,4,5,6,5,4,4,3,2,2,1,1,2,3,4,4,2
),
date.time = c(
"2017-10-22 04:49:23", "2017-10-23 01:02:06",
"2017-10-23 01:09:17", "2017-10-23 18:32:46",
"2017-10-24 18:50:19", "2017-11-01 03:07:24",
"2017-11-01 19:05:58", "2017-11-02 01:56:48",
"2017-11-02 01:58:16", "2017-11-02 02:00:38",
"2017-11-06 19:53:56", "2017-11-09 13:08:39",
"2017-09-18 01:25:27", "2017-09-19 05:19:43",
"2017-09-21 21:42:33", "2017-09-22 00:49:16",
"2017-09-22 03:48:05", "2017-09-22 20:56:57",
"2017-09-23 19:09:48", "2017-09-24 05:52:35"
),
time.diff.mins = NA
) %>%
arrange(date.time) %>%
mutate(
date.time = as.POSIXct(
date.time,
format = "%Y-%m-%d %H:%M:%S"
)
)
这给出:
location date.time time.diff.mins
1 2 2017-09-18 01:25:27 NA
2 1 2017-09-19 05:19:43 NA
3 1 2017-09-21 21:42:33 NA
4 2 2017-09-22 00:49:16 NA
5 3 2017-09-22 03:48:05 NA
...
...
因此,例如,我希望在第 4 行的 time.diff.mins 列中打印第 4 行和第 1 行之间的分钟数差异。并且 time.diff.mins 列,第 3 行之间会有时间差异第3行打印第3行和第2行。然后根据位置组迭代地继续计算前一条记录的时间差异。
这个循环适用于整个数据集,但我不知道如何将它与 dplyr::group_by 或其他一些方法集成..
for (i in 2:nrow(df)) {
df[i,3] <-
difftime(time1 = as.POSIXct(
df[i, 2],
format = "%Y:%m:%d %H:%M:%S"
),
time2 = as.POSIXct(
df[i-1, 2],
format = "%Y:%m:%d %H:%M:%S"
),
units = "mins"
)
}
这会生成例如:
location date.time time.diff.mins
1 2 2017-09-18 01:25:27 NA
2 1 2017-09-19 05:19:43 1674.266667
3 1 2017-09-21 21:42:33 3862.833333
4 2 2017-09-22 00:49:16 186.716667
5 3 2017-09-22 03:48:05 178.816667
...
...
任何建议或指导将不胜感激!
如果我们需要按'location'
分组library(dplyr)
df %>%
group_by(location) %>%
mutate(time.diff.mins = difftime(date.time, lag(date.time), unit = 'min'))