将 POSIX 次分配给 R 中的不等间隔
Assigning POSIX times to unequal intervals in R
我有一个来电时间列表:
dim(incoming)
[1] 50357 1
head(incoming, n = 50)
[1] "2015-10-08 02:14:46 EST" "2015-10-08 16:18:04 EST" "2015-10-08 01:32:42 EST" "2015-10-08 18:48:40 EST"
[5] "2015-10-08 16:53:33 EST" "2015-10-08 12:23:37 EST" "2015-10-08 06:38:34 EST" "2015-10-08 17:15:41 EST"
[9] "2015-10-08 19:43:00 EST" "2015-10-08 18:19:44 EST" "2015-10-08 01:10:39 EST" "2015-10-08 19:45:04 EST"
[13] "2015-10-08 18:29:57 EST" "2015-10-08 10:11:58 EST" "2015-10-08 08:44:10 EST" "2015-10-08 09:32:25 EST"
[17] "2015-10-08 08:23:32 EST" "2015-10-08 14:11:49 EST" "2015-10-08 06:27:45 EST" "2015-10-08 00:54:38 EST"
[21] "2015-10-08 08:56:34 EST" "2015-10-08 07:12:52 EST" "2015-10-08 18:28:40 EST" "2015-10-08 09:35:34 EST"
[25] "2015-10-08 09:51:06 EST" "2015-10-08 08:53:54 EST" "2015-10-08 00:42:43 EST" "2015-10-08 10:25:04 EST"
[29] "2015-10-08 07:13:28 EST" "2015-10-08 08:09:18 EST" "2015-10-08 16:32:59 EST" "2015-10-08 07:37:25 EST"
[33] "2015-10-08 07:46:52 EST" "2015-10-08 08:25:11 EST" "2015-10-08 11:51:10 EST" "2015-10-08 02:02:02 EST"
[37] "2015-10-08 09:23:24 EST" "2015-10-08 12:03:03 EST" "2015-10-08 07:36:34 EST" "2015-10-08 08:27:38 EST"
[41] "2015-10-08 02:16:47 EST" "2015-10-08 08:11:54 EST" "2015-10-08 07:46:22 EST" "2015-10-08 08:34:52 EST"
[45] "2015-10-08 00:00:37 EST" "2015-10-08 08:37:26 EST" "2015-10-08 01:33:00 EST" "2015-10-08 17:16:15 EST"
[49] "2015-10-08 09:10:07 EST" "2015-10-08 08:07:43 EST"
我的objective是将这些时间分配给它们所属的相应区间,然而,这些区间并不是等距的。例如:前 25 个区间是:
head(data, n = 25)
interval
1 2015-10-08 00:05:00
2 2015-10-08 00:12:00
3 2015-10-08 00:34:00
4 2015-10-08 00:40:00
5 2015-10-08 01:32:00
6 2015-10-08 01:52:00
7 2015-10-08 02:52:00
8 2015-10-08 02:58:00
9 2015-10-08 04:13:00
10 2015-10-08 04:30:00
11 2015-10-08 05:58:00
12 2015-10-08 06:16:00
13 2015-10-08 06:41:00
14 2015-10-08 06:54:00
15 2015-10-08 07:07:00
16 2015-10-08 07:25:00
17 2015-10-08 07:38:00
18 2015-10-08 07:52:00
19 2015-10-08 08:05:00
20 2015-10-08 08:18:00
21 2015-10-08 08:31:00
22 2015-10-08 08:44:00
23 2015-10-08 08:57:00
24 2015-10-08 09:10:00
25 2015-10-08 09:22:00
例如,第一次incoming[1,]
应该分配到02:52:00区间,因为它介于01:52:00和02:52:00之间,第三次incoming[3,]
应该分配给 01:52:00 间隔,因为它介于 01:32:00 和 01:52:00 之间,依此类推。
我的终极objective是统计每个区间内有多少传入的时间。我能够将传入时间汇总为偶数序列,例如 10 分钟的时间间隔:interval <- incoming - minutes(minute(incoming) %% 10) - seconds(second(incoming))
这会将每次分配给 10 分钟的间隔,但我不确定如何以不均匀的间隔执行此操作
> dput(data)
structure(list(interval = structure(c(1444280700, 1444281120,
1444282440, 1444282800, 1444285920, 1444287120, 1444290720, 1444291080,
1444295580, 1444296600, 1444301880, 1444302960, 1444304460, 1444305240,
1444306020, 1444307100, 1444307880, 1444308720, 1444309500, 1444310280,
1444311060, 1444311840, 1444312620, 1444313400, 1444314120, 1444314900,
1444315680, 1444316400, 1444317120, 1444317840, 1444318620, 1444319340,
1444320180, 1444321080, 1444321980, 1444322880, 1444323720, 1444324620,
1444325520, 1444326420, 1444327140, 1444327920, 1444328640, 1444329420,
1444330140, 1444330920, 1444331700, 1444332480, 1444333200, 1444333980,
1444334820, 1444335600, 1444336380, 1444337160, 1444337940, 1444338780,
1444339560, 1444340340, 1444341120, 1444341960, 1444342740, 1444343520,
1444344780, 1444345920, 1444346700, 1444347480, 1444348260, 1444349040,
1444349820, 1444350600, 1444351380, 1444352100, 1444352880, 1444353660,
1444354740, 1444355580, 1444356180, 1444357080, 1444357740, 1444358460,
1444359180, 1444359840, 1444360560, 1444361220, 1444361880, 1444362960,
1444363440, 1444364160, 1444364640, 1444365300, 1444366200, 1444366560
), class = c("POSIXct", "POSIXt"), tzone = "EST")), .Names = "interval", row.names = c(NA,
92L), class = "data.frame")
> dput(head(incoming, n = 100))
structure(list(incoming = structure(c(1444288486, 1444339084,
1444285962, 1444348120, 1444341213, 1444325017, 1444304314, 1444342541,
1444351380, 1444346384, 1444284639, 1444351504, 1444346997, 1444317118,
1444311850, 1444314745, 1444310612, 1444331509, 1444303665, 1444283678,
1444312594, 1444306372, 1444346920, 1444314934, 1444315866, 1444312434,
1444282963, 1444317904, 1444306408, 1444309758, 1444339979, 1444307845,
1444308412, 1444310711, 1444323070, 1444287722, 1444314204, 1444323783,
1444307794, 1444310858, 1444288607, 1444309914, 1444308382, 1444311292,
1444280437, 1444311446, 1444285980, 1444342575, 1444313407, 1444309663,
1444313328, 1444313004, 1444312594, 1444311171, 1444312992, 1444305160,
1444305558, 1444310477, 1444301756, 1444308008, 1444310435, 1444311397,
1444305549, 1444281371, 1444281799, 1444282338, 1444281573, 1444280541,
1444281215, 1444280953, 1444281107, 1444281161, 1444280640, 1444280639,
1444281847, 1444327017, 1444281855, 1444281842, 1444280998, 1444280620,
1444280466, 1444280579, 1444280881, 1444280534, 1444280879, 1444280535,
1444280610, 1444280449, 1444280413, 1444280574, 1444280482, 1444280543,
1444280536, 1444280527, 1444280889, 1444281854, 1444280954, 1444280444,
1444281531, 1444281033), class = c("POSIXct", "POSIXt"), tzone = "EST")), .Names = "incoming", row.names = c(NA,
100L), class = "data.frame")
我在这些时候使用扫描阅读(这导致它们都在我的时区)然后检查 findInterval
是否是正确的选择,它似乎可以毫无怨言地处理 POSIXct 对象(因为它们存储为数字模式):
> table( findInterval( incoming$incoming, data$interval) ) # redone with new dput output
0 1 2 4 5 6 10 12 13 14 15 16 17 19 20 21 22 23 24 25 26 27 28 30 36 37 38 40 46 56 57 59 60 64 65 66 71
18 8 11 3 2 3 1 2 1 2 2 2 3 3 5 4 4 3 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 2
零间隔在间隔向量中的第一个 "cut" 之前。如果你想让它移动一个,那么在切割点向量(第二个参数)的两侧加上 -Inf
和 Inf
.
我有一个来电时间列表:
dim(incoming) [1] 50357 1
head(incoming, n = 50) [1] "2015-10-08 02:14:46 EST" "2015-10-08 16:18:04 EST" "2015-10-08 01:32:42 EST" "2015-10-08 18:48:40 EST" [5] "2015-10-08 16:53:33 EST" "2015-10-08 12:23:37 EST" "2015-10-08 06:38:34 EST" "2015-10-08 17:15:41 EST" [9] "2015-10-08 19:43:00 EST" "2015-10-08 18:19:44 EST" "2015-10-08 01:10:39 EST" "2015-10-08 19:45:04 EST" [13] "2015-10-08 18:29:57 EST" "2015-10-08 10:11:58 EST" "2015-10-08 08:44:10 EST" "2015-10-08 09:32:25 EST" [17] "2015-10-08 08:23:32 EST" "2015-10-08 14:11:49 EST" "2015-10-08 06:27:45 EST" "2015-10-08 00:54:38 EST" [21] "2015-10-08 08:56:34 EST" "2015-10-08 07:12:52 EST" "2015-10-08 18:28:40 EST" "2015-10-08 09:35:34 EST" [25] "2015-10-08 09:51:06 EST" "2015-10-08 08:53:54 EST" "2015-10-08 00:42:43 EST" "2015-10-08 10:25:04 EST" [29] "2015-10-08 07:13:28 EST" "2015-10-08 08:09:18 EST" "2015-10-08 16:32:59 EST" "2015-10-08 07:37:25 EST" [33] "2015-10-08 07:46:52 EST" "2015-10-08 08:25:11 EST" "2015-10-08 11:51:10 EST" "2015-10-08 02:02:02 EST" [37] "2015-10-08 09:23:24 EST" "2015-10-08 12:03:03 EST" "2015-10-08 07:36:34 EST" "2015-10-08 08:27:38 EST" [41] "2015-10-08 02:16:47 EST" "2015-10-08 08:11:54 EST" "2015-10-08 07:46:22 EST" "2015-10-08 08:34:52 EST" [45] "2015-10-08 00:00:37 EST" "2015-10-08 08:37:26 EST" "2015-10-08 01:33:00 EST" "2015-10-08 17:16:15 EST" [49] "2015-10-08 09:10:07 EST" "2015-10-08 08:07:43 EST"
我的objective是将这些时间分配给它们所属的相应区间,然而,这些区间并不是等距的。例如:前 25 个区间是:
head(data, n = 25) interval 1 2015-10-08 00:05:00 2 2015-10-08 00:12:00 3 2015-10-08 00:34:00 4 2015-10-08 00:40:00 5 2015-10-08 01:32:00 6 2015-10-08 01:52:00 7 2015-10-08 02:52:00 8 2015-10-08 02:58:00 9 2015-10-08 04:13:00 10 2015-10-08 04:30:00 11 2015-10-08 05:58:00 12 2015-10-08 06:16:00 13 2015-10-08 06:41:00 14 2015-10-08 06:54:00 15 2015-10-08 07:07:00 16 2015-10-08 07:25:00 17 2015-10-08 07:38:00 18 2015-10-08 07:52:00 19 2015-10-08 08:05:00 20 2015-10-08 08:18:00 21 2015-10-08 08:31:00 22 2015-10-08 08:44:00 23 2015-10-08 08:57:00 24 2015-10-08 09:10:00 25 2015-10-08 09:22:00
例如,第一次incoming[1,]
应该分配到02:52:00区间,因为它介于01:52:00和02:52:00之间,第三次incoming[3,]
应该分配给 01:52:00 间隔,因为它介于 01:32:00 和 01:52:00 之间,依此类推。
我的终极objective是统计每个区间内有多少传入的时间。我能够将传入时间汇总为偶数序列,例如 10 分钟的时间间隔:interval <- incoming - minutes(minute(incoming) %% 10) - seconds(second(incoming))
这会将每次分配给 10 分钟的间隔,但我不确定如何以不均匀的间隔执行此操作
> dput(data) structure(list(interval = structure(c(1444280700, 1444281120, 1444282440, 1444282800, 1444285920, 1444287120, 1444290720, 1444291080, 1444295580, 1444296600, 1444301880, 1444302960, 1444304460, 1444305240, 1444306020, 1444307100, 1444307880, 1444308720, 1444309500, 1444310280, 1444311060, 1444311840, 1444312620, 1444313400, 1444314120, 1444314900, 1444315680, 1444316400, 1444317120, 1444317840, 1444318620, 1444319340, 1444320180, 1444321080, 1444321980, 1444322880, 1444323720, 1444324620, 1444325520, 1444326420, 1444327140, 1444327920, 1444328640, 1444329420, 1444330140, 1444330920, 1444331700, 1444332480, 1444333200, 1444333980, 1444334820, 1444335600, 1444336380, 1444337160, 1444337940, 1444338780, 1444339560, 1444340340, 1444341120, 1444341960, 1444342740, 1444343520, 1444344780, 1444345920, 1444346700, 1444347480, 1444348260, 1444349040, 1444349820, 1444350600, 1444351380, 1444352100, 1444352880, 1444353660, 1444354740, 1444355580, 1444356180, 1444357080, 1444357740, 1444358460, 1444359180, 1444359840, 1444360560, 1444361220, 1444361880, 1444362960, 1444363440, 1444364160, 1444364640, 1444365300, 1444366200, 1444366560 ), class = c("POSIXct", "POSIXt"), tzone = "EST")), .Names = "interval", row.names = c(NA, 92L), class = "data.frame")
> dput(head(incoming, n = 100)) structure(list(incoming = structure(c(1444288486, 1444339084, 1444285962, 1444348120, 1444341213, 1444325017, 1444304314, 1444342541, 1444351380, 1444346384, 1444284639, 1444351504, 1444346997, 1444317118, 1444311850, 1444314745, 1444310612, 1444331509, 1444303665, 1444283678, 1444312594, 1444306372, 1444346920, 1444314934, 1444315866, 1444312434, 1444282963, 1444317904, 1444306408, 1444309758, 1444339979, 1444307845, 1444308412, 1444310711, 1444323070, 1444287722, 1444314204, 1444323783, 1444307794, 1444310858, 1444288607, 1444309914, 1444308382, 1444311292, 1444280437, 1444311446, 1444285980, 1444342575, 1444313407, 1444309663, 1444313328, 1444313004, 1444312594, 1444311171, 1444312992, 1444305160, 1444305558, 1444310477, 1444301756, 1444308008, 1444310435, 1444311397, 1444305549, 1444281371, 1444281799, 1444282338, 1444281573, 1444280541, 1444281215, 1444280953, 1444281107, 1444281161, 1444280640, 1444280639, 1444281847, 1444327017, 1444281855, 1444281842, 1444280998, 1444280620, 1444280466, 1444280579, 1444280881, 1444280534, 1444280879, 1444280535, 1444280610, 1444280449, 1444280413, 1444280574, 1444280482, 1444280543, 1444280536, 1444280527, 1444280889, 1444281854, 1444280954, 1444280444, 1444281531, 1444281033), class = c("POSIXct", "POSIXt"), tzone = "EST")), .Names = "incoming", row.names = c(NA, 100L), class = "data.frame")
我在这些时候使用扫描阅读(这导致它们都在我的时区)然后检查 findInterval
是否是正确的选择,它似乎可以毫无怨言地处理 POSIXct 对象(因为它们存储为数字模式):
> table( findInterval( incoming$incoming, data$interval) ) # redone with new dput output
0 1 2 4 5 6 10 12 13 14 15 16 17 19 20 21 22 23 24 25 26 27 28 30 36 37 38 40 46 56 57 59 60 64 65 66 71
18 8 11 3 2 3 1 2 1 2 2 2 3 3 5 4 4 3 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 2
零间隔在间隔向量中的第一个 "cut" 之前。如果你想让它移动一个,那么在切割点向量(第二个参数)的两侧加上 -Inf
和 Inf
.