组和子集时间
Group and subset time
我在数据框中有时间数据,如下所示:
date day time phone lat lon acc update
6 12/08/2014 Tue 07:25:35PM 9052780809 17.41653 78.40537 3.9 1.406988e+12
44 12/08/2014 Tue 07:26:35PM 9052780809 17.41823 78.40344 3.9 1.406988e+12
114 12/08/2014 Tue 07:28:32PM 9052780809 17.41810 78.39846 3.9 1.406988e+12
152 12/08/2014 Tue 07:29:30PM 9052780809 17.41760 78.39512 3.9 1.406988e+12
188 12/08/2014 Tue 07:30:31PM 9052780809 17.41517 78.39426 3.9 1.406988e+12
223 12/08/2014 Tue 07:31:30PM 9052780809 17.41467 78.39434 3.9 1.406988e+12
大多数时间相差 1-2 分钟,但也有情况相差超过 10 分钟,例如二读后。如果它们之间的差异超过 10 分钟,则连续读数可能在不同的日期。我想在阅读后插入一个休息时间,它们之间的间隔超过 10 分钟,并将它们插入另一个数据框以进一步处理它们。
date day time phone lat lon acc update
145315 16/08/2014 Sat 11:54:57AM 9052780809 17.41377 78.45923 3.9 1.406988e+12
145371 16/08/2014 Sat 11:55:56AM 9052780809 17.41626 78.45750 3.9 1.406988e+12
145426 16/08/2014 Sat 11:56:55AM 9052780809 17.41746 78.45547 4.0 1.406988e+12
162349 16/08/2014 Sat 05:02:51PM 9052780809 17.41562 78.44446 3.9 1.406988e+12
162404 16/08/2014 Sat 05:03:55PM 9052780809 17.41577 78.44113 3.9 1.406988e+12
162452 16/08/2014 Sat 05:04:51PM 9052780809 17.41638 78.43815 3.9 1.406988e+12
原始数据有8列,超过700000行
只是从评论中粘贴,以便问题得到解答。您可以使用 split
(@docendo discimus 建议)和 difftime
(来自@Laurik)来获取预期的数据集。
假设 "time1" 是数据集 ("dat") 中的 "time" 列,使用 "time1" 转换为 "POSIXlt" class =15=],用difftime
得到连续元素之间"minutes"的差值。在这里,我删除了最后一个元素和第一个元素,以便我们可以找到当前 dt1[-length(dt1)]
和下一个元素 dt1[-1]
之间的差异,应用条件 >10
,cumsum
逻辑索引split
数据集基于该索引得到 data.frames (lst
) 的列表。在列表中工作可能比创建单个 data.frame 对象更好。
dt1 <- strptime(dat$time1, format='%I:%M:%OS%p')
lst <- split(dat, cumsum(c(FALSE,difftime(dt1[-length(dt1)],
dt1[-1], unit='min')>10)))
更新
使用新数据集dat
dt1 <- with(dat, strptime(paste(date, time),
format='%d/%m/%Y %I:%M:%OS%p'))
indx <- cumsum(c(FALSE, abs(difftime(dt1[-length(dt1)], dt1[-1],
unit='min')) >10))
split(dat, indx)
#$`0`
# date day time phone lat lon acc update
#6 12/08/2014 Tue 07:25:35PM 9052780809 17.41653 78.40537 3.9 1.406988e+12
#44 12/08/2014 Tue 07:26:35PM 9052780809 17.41823 78.40344 3.9 1.406988e+12
#114 12/08/2014 Tue 07:28:32PM 9052780809 17.41810 78.39846 3.9 1.406988e+12
#152 12/08/2014 Tue 07:29:30PM 9052780809 17.41760 78.39512 3.9 1.406988e+12
#188 12/08/2014 Tue 07:30:31PM 9052780809 17.41517 78.39426 3.9 1.406988e+12
#223 12/08/2014 Tue 07:31:30PM 9052780809 17.41467 78.39434 3.9 1.406988e+12
#$`1`
# date day time phone lat lon acc update
#145315 16/08/2014 Sat 11:54:57AM 9052780809 17.41377 78.45923 3.9 1.406988e+12
#145371 16/08/2014 Sat 11:55:56AM 9052780809 17.41626 78.45750 3.9 1.406988e+12
#145426 16/08/2014 Sat 11:56:55AM 9052780809 17.41746 78.45547 4.0 1.406988e+12
#$`2`
# date day time phone lat lon acc update
#162349 16/08/2014 Sat 05:02:51PM 9052780809 17.41562 78.44446 3.9 1.406988e+12
#162404 16/08/2014 Sat 05:03:55PM 9052780809 17.41577 78.44113 3.9 1.406988e+12
#162452 16/08/2014 Sat 05:04:51PM 9052780809 17.41638 78.43815 3.9 1.406988e+12
数据
dat <- structure(list(date = c("12/08/2014", "12/08/2014", "12/08/2014",
"12/08/2014", "12/08/2014", "12/08/2014", "16/08/2014", "16/08/2014",
"16/08/2014", "16/08/2014", "16/08/2014", "16/08/2014"), day = c("Tue",
"Tue", "Tue", "Tue", "Tue", "Tue", "Sat", "Sat", "Sat", "Sat",
"Sat", "Sat"), time = c("07:25:35PM", "07:26:35PM", "07:28:32PM",
"07:29:30PM", "07:30:31PM", "07:31:30PM", "11:54:57AM", "11:55:56AM",
"11:56:55AM", "05:02:51PM", "05:03:55PM", "05:04:51PM"), phone = c(9052780809,
9052780809, 9052780809, 9052780809, 9052780809, 9052780809, 9052780809,
9052780809, 9052780809, 9052780809, 9052780809, 9052780809),
lat = c(17.41653, 17.41823, 17.4181, 17.4176, 17.41517, 17.41467,
17.41377, 17.41626, 17.41746, 17.41562, 17.41577, 17.41638
), lon = c(78.40537, 78.40344, 78.39846, 78.39512, 78.39426,
78.39434, 78.45923, 78.4575, 78.45547, 78.44446, 78.44113,
78.43815), acc = c(3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9,
4, 3.9, 3.9, 3.9), update = c(1.406988e+12, 1.406988e+12,
1.406988e+12, 1.406988e+12, 1.406988e+12, 1.406988e+12, 1.406988e+12,
1.406988e+12, 1.406988e+12, 1.406988e+12, 1.406988e+12, 1.406988e+12
)), .Names = c("date", "day", "time", "phone", "lat", "lon",
"acc", "update"), class = "data.frame", row.names = c("6", "44",
"114", "152", "188", "223", "145315", "145371", "145426", "162349",
"162404", "162452"))
我在数据框中有时间数据,如下所示:
date day time phone lat lon acc update
6 12/08/2014 Tue 07:25:35PM 9052780809 17.41653 78.40537 3.9 1.406988e+12
44 12/08/2014 Tue 07:26:35PM 9052780809 17.41823 78.40344 3.9 1.406988e+12
114 12/08/2014 Tue 07:28:32PM 9052780809 17.41810 78.39846 3.9 1.406988e+12
152 12/08/2014 Tue 07:29:30PM 9052780809 17.41760 78.39512 3.9 1.406988e+12
188 12/08/2014 Tue 07:30:31PM 9052780809 17.41517 78.39426 3.9 1.406988e+12
223 12/08/2014 Tue 07:31:30PM 9052780809 17.41467 78.39434 3.9 1.406988e+12
大多数时间相差 1-2 分钟,但也有情况相差超过 10 分钟,例如二读后。如果它们之间的差异超过 10 分钟,则连续读数可能在不同的日期。我想在阅读后插入一个休息时间,它们之间的间隔超过 10 分钟,并将它们插入另一个数据框以进一步处理它们。
date day time phone lat lon acc update
145315 16/08/2014 Sat 11:54:57AM 9052780809 17.41377 78.45923 3.9 1.406988e+12
145371 16/08/2014 Sat 11:55:56AM 9052780809 17.41626 78.45750 3.9 1.406988e+12
145426 16/08/2014 Sat 11:56:55AM 9052780809 17.41746 78.45547 4.0 1.406988e+12
162349 16/08/2014 Sat 05:02:51PM 9052780809 17.41562 78.44446 3.9 1.406988e+12
162404 16/08/2014 Sat 05:03:55PM 9052780809 17.41577 78.44113 3.9 1.406988e+12
162452 16/08/2014 Sat 05:04:51PM 9052780809 17.41638 78.43815 3.9 1.406988e+12
原始数据有8列,超过700000行
只是从评论中粘贴,以便问题得到解答。您可以使用 split
(@docendo discimus 建议)和 difftime
(来自@Laurik)来获取预期的数据集。
假设 "time1" 是数据集 ("dat") 中的 "time" 列,使用 "time1" 转换为 "POSIXlt" class =15=],用difftime
得到连续元素之间"minutes"的差值。在这里,我删除了最后一个元素和第一个元素,以便我们可以找到当前 dt1[-length(dt1)]
和下一个元素 dt1[-1]
之间的差异,应用条件 >10
,cumsum
逻辑索引split
数据集基于该索引得到 data.frames (lst
) 的列表。在列表中工作可能比创建单个 data.frame 对象更好。
dt1 <- strptime(dat$time1, format='%I:%M:%OS%p')
lst <- split(dat, cumsum(c(FALSE,difftime(dt1[-length(dt1)],
dt1[-1], unit='min')>10)))
更新
使用新数据集dat
dt1 <- with(dat, strptime(paste(date, time),
format='%d/%m/%Y %I:%M:%OS%p'))
indx <- cumsum(c(FALSE, abs(difftime(dt1[-length(dt1)], dt1[-1],
unit='min')) >10))
split(dat, indx)
#$`0`
# date day time phone lat lon acc update
#6 12/08/2014 Tue 07:25:35PM 9052780809 17.41653 78.40537 3.9 1.406988e+12
#44 12/08/2014 Tue 07:26:35PM 9052780809 17.41823 78.40344 3.9 1.406988e+12
#114 12/08/2014 Tue 07:28:32PM 9052780809 17.41810 78.39846 3.9 1.406988e+12
#152 12/08/2014 Tue 07:29:30PM 9052780809 17.41760 78.39512 3.9 1.406988e+12
#188 12/08/2014 Tue 07:30:31PM 9052780809 17.41517 78.39426 3.9 1.406988e+12
#223 12/08/2014 Tue 07:31:30PM 9052780809 17.41467 78.39434 3.9 1.406988e+12
#$`1`
# date day time phone lat lon acc update
#145315 16/08/2014 Sat 11:54:57AM 9052780809 17.41377 78.45923 3.9 1.406988e+12
#145371 16/08/2014 Sat 11:55:56AM 9052780809 17.41626 78.45750 3.9 1.406988e+12
#145426 16/08/2014 Sat 11:56:55AM 9052780809 17.41746 78.45547 4.0 1.406988e+12
#$`2`
# date day time phone lat lon acc update
#162349 16/08/2014 Sat 05:02:51PM 9052780809 17.41562 78.44446 3.9 1.406988e+12
#162404 16/08/2014 Sat 05:03:55PM 9052780809 17.41577 78.44113 3.9 1.406988e+12
#162452 16/08/2014 Sat 05:04:51PM 9052780809 17.41638 78.43815 3.9 1.406988e+12
数据
dat <- structure(list(date = c("12/08/2014", "12/08/2014", "12/08/2014",
"12/08/2014", "12/08/2014", "12/08/2014", "16/08/2014", "16/08/2014",
"16/08/2014", "16/08/2014", "16/08/2014", "16/08/2014"), day = c("Tue",
"Tue", "Tue", "Tue", "Tue", "Tue", "Sat", "Sat", "Sat", "Sat",
"Sat", "Sat"), time = c("07:25:35PM", "07:26:35PM", "07:28:32PM",
"07:29:30PM", "07:30:31PM", "07:31:30PM", "11:54:57AM", "11:55:56AM",
"11:56:55AM", "05:02:51PM", "05:03:55PM", "05:04:51PM"), phone = c(9052780809,
9052780809, 9052780809, 9052780809, 9052780809, 9052780809, 9052780809,
9052780809, 9052780809, 9052780809, 9052780809, 9052780809),
lat = c(17.41653, 17.41823, 17.4181, 17.4176, 17.41517, 17.41467,
17.41377, 17.41626, 17.41746, 17.41562, 17.41577, 17.41638
), lon = c(78.40537, 78.40344, 78.39846, 78.39512, 78.39426,
78.39434, 78.45923, 78.4575, 78.45547, 78.44446, 78.44113,
78.43815), acc = c(3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9, 3.9,
4, 3.9, 3.9, 3.9), update = c(1.406988e+12, 1.406988e+12,
1.406988e+12, 1.406988e+12, 1.406988e+12, 1.406988e+12, 1.406988e+12,
1.406988e+12, 1.406988e+12, 1.406988e+12, 1.406988e+12, 1.406988e+12
)), .Names = c("date", "day", "time", "phone", "lat", "lon",
"acc", "update"), class = "data.frame", row.names = c("6", "44",
"114", "152", "188", "223", "145315", "145371", "145426", "162349",
"162404", "162452"))