将数据时间序列分成 1 小时间隔
Subset datatime series into 1h intervals
我有一个带有 POSIXct class 值的数据框 (dim: 589)。
df <- read.table(header = T, stringsAsFactors = F, text = " id time par_surface
1 2014-07-19 07:10:00 907.6
2 2014-07-19 07:11:00 956.2
3 2014-07-19 07:12:00 963.4
4 2014-07-19 07:14:00 957.6
5 2014-07-19 07:15:00 876.8
6 2014-07-19 07:16:00 883.6
7 2014-07-19 07:18:00 903.8
8 2014-07-19 07:18:59 817.4
9 2014-07-19 07:19:59 880.8
10 2014-07-19 07:21:59 877.6
11 2014-07-19 07:22:59 960.0
12 2014-07-19 07:24:00 977.8
13 2014-07-19 07:26:00 964.0
14 2014-07-19 07:27:00 995.0
15 2014-07-19 07:28:00 1053.8
16 2014-07-19 07:29:59 1024.4
17 2014-07-19 07:30:59 916.0
18 2014-07-19 07:31:59 1042.6
19 2014-07-19 07:34:00 1047.4
20 2014-07-19 07:35:00 1022.8
21 2014-07-19 07:36:00 1023.8
22 2014-07-19 07:38:00 993.2
23 2014-07-19 07:39:00 1009.4
24 2014-07-19 07:39:59 950.0
25 2014-07-19 07:42:00 986.2
26 2014-07-19 07:43:00 971.0
27 2014-07-19 07:44:00 879.6
28 2014-07-19 07:46:00 841.6
29 2014-07-19 07:47:00 928.8
30 2014-07-19 07:47:59 1000.8
31 2014-07-19 07:50:00 1027.8
32 2014-07-19 07:51:00 977.2
33 2014-07-19 07:51:59 1040.4
34 2014-07-19 07:54:00 1049.4
35 2014-07-19 07:54:59 1131.6
36 2014-07-19 07:55:59 1186.2
37 2014-07-19 07:58:00 1171.0
38 2014-07-19 07:58:59 1168.8
39 2014-07-19 08:00:00 1093.8
40 2014-07-19 08:02:00 1204.8
41 2014-07-19 08:03:00 1214.8
42 2014-07-19 08:03:59 1224.2
43 2014-07-19 08:05:59 1217.2
44 2014-07-19 08:06:59 1239.2
45 2014-07-19 08:08:00 1196.2
46 2014-07-19 08:10:00 1203.8
47 2014-07-19 08:10:59 1211.8
48 2014-07-19 08:12:00 1167.2
49 2014-07-19 08:13:59 1163.2
50 2014-07-19 08:15:00 1179.6
51 2014-07-19 08:16:00 1218.2
52 2014-07-19 08:18:00 1245.4")
现在我需要将其细分为每小时的时间间隔。保留第一个值很重要
time par_surface
1 2014-07-19 07:10:00 907.6
2 2014-07-19 08:10:00 1203.8
...
我试过 split()
、cut()
和 plyr::ddply()
这样做,但没有用。
这是一个应该做到这一点的单行代码:
df[which(df[,1] %in% seq.POSIXt(from=min(df[,1]), by="hour", to=max(df[,1]))),]
我假设您将在一小时内取这些值的平均值
df = aggregate(list(col2=df$col2),by=list(timestamp=cut(as.POSIXct(df$timestamp),"hour")),mean)
这里的col2是指第2列的名称,timestamp是指第1列的名称
我有一个带有 POSIXct class 值的数据框 (dim: 589)。
df <- read.table(header = T, stringsAsFactors = F, text = " id time par_surface
1 2014-07-19 07:10:00 907.6
2 2014-07-19 07:11:00 956.2
3 2014-07-19 07:12:00 963.4
4 2014-07-19 07:14:00 957.6
5 2014-07-19 07:15:00 876.8
6 2014-07-19 07:16:00 883.6
7 2014-07-19 07:18:00 903.8
8 2014-07-19 07:18:59 817.4
9 2014-07-19 07:19:59 880.8
10 2014-07-19 07:21:59 877.6
11 2014-07-19 07:22:59 960.0
12 2014-07-19 07:24:00 977.8
13 2014-07-19 07:26:00 964.0
14 2014-07-19 07:27:00 995.0
15 2014-07-19 07:28:00 1053.8
16 2014-07-19 07:29:59 1024.4
17 2014-07-19 07:30:59 916.0
18 2014-07-19 07:31:59 1042.6
19 2014-07-19 07:34:00 1047.4
20 2014-07-19 07:35:00 1022.8
21 2014-07-19 07:36:00 1023.8
22 2014-07-19 07:38:00 993.2
23 2014-07-19 07:39:00 1009.4
24 2014-07-19 07:39:59 950.0
25 2014-07-19 07:42:00 986.2
26 2014-07-19 07:43:00 971.0
27 2014-07-19 07:44:00 879.6
28 2014-07-19 07:46:00 841.6
29 2014-07-19 07:47:00 928.8
30 2014-07-19 07:47:59 1000.8
31 2014-07-19 07:50:00 1027.8
32 2014-07-19 07:51:00 977.2
33 2014-07-19 07:51:59 1040.4
34 2014-07-19 07:54:00 1049.4
35 2014-07-19 07:54:59 1131.6
36 2014-07-19 07:55:59 1186.2
37 2014-07-19 07:58:00 1171.0
38 2014-07-19 07:58:59 1168.8
39 2014-07-19 08:00:00 1093.8
40 2014-07-19 08:02:00 1204.8
41 2014-07-19 08:03:00 1214.8
42 2014-07-19 08:03:59 1224.2
43 2014-07-19 08:05:59 1217.2
44 2014-07-19 08:06:59 1239.2
45 2014-07-19 08:08:00 1196.2
46 2014-07-19 08:10:00 1203.8
47 2014-07-19 08:10:59 1211.8
48 2014-07-19 08:12:00 1167.2
49 2014-07-19 08:13:59 1163.2
50 2014-07-19 08:15:00 1179.6
51 2014-07-19 08:16:00 1218.2
52 2014-07-19 08:18:00 1245.4")
现在我需要将其细分为每小时的时间间隔。保留第一个值很重要
time par_surface
1 2014-07-19 07:10:00 907.6
2 2014-07-19 08:10:00 1203.8
...
我试过 split()
、cut()
和 plyr::ddply()
这样做,但没有用。
这是一个应该做到这一点的单行代码:
df[which(df[,1] %in% seq.POSIXt(from=min(df[,1]), by="hour", to=max(df[,1]))),]
我假设您将在一小时内取这些值的平均值
df = aggregate(list(col2=df$col2),by=list(timestamp=cut(as.POSIXct(df$timestamp),"hour")),mean)
这里的col2是指第2列的名称,timestamp是指第1列的名称