根据R中的日期和时间计算列平均值
Compute column average based on date and time in R
我有一个矩阵,看起来有点像这样:
Date Time Data
15000 04/09/2014 05:45:00 0.908
15001 04/09/2014 06:00:00 0.888
15002 04/09/2014 06:15:00 0.976
15003 04/09/2014 06:30:00 1.632
15004 04/09/2014 06:45:00 1.648
15005 04/09/2014 07:00:00 1.164
15006 04/09/2014 07:15:00 0.568
15007 04/09/2014 07:30:00 1.020
15008 04/09/2014 07:45:00 1.052
15009 04/09/2014 08:00:00 0.920
15010 04/09/2014 08:15:00 0.656
15011 04/09/2014 08:30:00 1.172
15012 04/09/2014 08:45:00 1.000
15013 04/09/2014 09:00:00 1.420
15014 04/09/2014 09:15:00 0.936
15015 04/09/2014 09:30:00 0.996
15016 04/09/2014 09:45:00 1.100
15017 04/09/2014 10:00:00 0.492
它包含一年的数据,每天有 96 行(从 00:00 到 23:45 间隔 15 分钟)。我的问题是,我想根据我指定的时间范围对每一天的 data
列求平均值。例如,如果我想对每天 06:00 - 08:00 的时间取平均值,在上面的代码中我应该得到日期 04/09/2014
的 1.0964
的答案。
我不知道如何使用日期和时间列作为过滤器来执行此操作,想知道是否有人可以提供帮助?
为了让事情变得更加复杂,我还想计算不同时间段内每天 45 分钟的滚动平均值,比如 04:00 - 09:00。同样,因为这是针对每一天,所以最好在矩阵中得到结果,每一行都是特定日期,然后列将代表 04:00 - 04:45
、[=15= 的滚动平均值]...
有什么想法吗?!
检查以下代码,如果有任何不清楚的地方请告诉我
data = read.table(header = T, stringsAsFactors = F, text = "Index Date Time Data
15000 04/09/2014 05:45:00 0.908
15001 04/09/2014 06:00:00 0.888
15002 04/09/2014 06:15:00 0.976
15003 04/09/2014 06:30:00 1.632
15004 04/09/2014 06:45:00 1.648
15005 04/09/2014 07:00:00 1.164
15006 04/09/2014 07:15:00 0.568
15007 04/09/2014 07:30:00 1.020
15008 04/09/2014 07:45:00 1.052
15009 04/09/2014 08:00:00 0.920
15010 04/09/2014 08:15:00 0.656
15011 04/09/2014 08:30:00 1.172
15012 04/09/2014 08:45:00 1.000
15013 04/09/2014 09:00:00 1.420
15014 04/09/2014 09:15:00 0.936
15015 04/09/2014 09:30:00 0.996
15016 04/09/2014 09:45:00 1.100
15017 04/09/2014 10:00:00 0.492")
library("magrittr")
data$parsed.timestamp = paste(data$Date, data$Time) %>% strptime(., format = "%d/%m/%Y %H:%M:%S")
# Hourly Average
desiredGroupingUnit = cut(data$parsed.timestamp, breaks = "hour") #You can use substr for that also
aggregate(data$Data, by = list(desiredGroupingUnit), FUN = mean )
# Group.1 x
# 1 2014-09-04 05:00:00 0.908
# 2 2014-09-04 06:00:00 1.286
# 3 2014-09-04 07:00:00 0.951
# 4 2014-09-04 08:00:00 0.937
# 5 2014-09-04 09:00:00 1.113
# 6 2014-09-04 10:00:00 0.492
# Moving average
getAvgBetweenTwoTimeStamps = function(data, startTime, endTime) {
avergeThoseIndcies = which(data$parsed.timestamp >= startTime & data$parsed.timestamp <= endTime)
return(mean(data$Data[avergeThoseIndcies]))
}
movingAvgWindow = 45*60 #minutes
movingAvgTimestamps = data.frame(from = data$parsed.timestamp, to = data$parsed.timestamp + movingAvgWindow)
movingAvgTimestamps$movingAvg =
apply(movingAvgTimestamps, MARGIN = 1,
FUN = function(x) getAvgBetweenTwoTimeStamps(data = data, startTime = x["from"], endTime = x["to"]))
print(movingAvgTimestamps)
# from to movingAvg
# 1 2014-09-04 05:45:00 2014-09-04 06:30:00 1.1010000
# 2 2014-09-04 06:00:00 2014-09-04 06:45:00 1.2860000
# 3 2014-09-04 06:15:00 2014-09-04 07:00:00 1.3550000
# 4 2014-09-04 06:30:00 2014-09-04 07:15:00 1.2530000
# 5 2014-09-04 06:45:00 2014-09-04 07:30:00 1.1000000
# 6 2014-09-04 07:00:00 2014-09-04 07:45:00 0.9510000
# 7 2014-09-04 07:15:00 2014-09-04 08:00:00 0.8900000
# 8 2014-09-04 07:30:00 2014-09-04 08:15:00 0.9120000
# 9 2014-09-04 07:45:00 2014-09-04 08:30:00 0.9500000
# 10 2014-09-04 08:00:00 2014-09-04 08:45:00 0.9370000
# 11 2014-09-04 08:15:00 2014-09-04 09:00:00 1.0620000
# 12 2014-09-04 08:30:00 2014-09-04 09:15:00 1.1320000
# 13 2014-09-04 08:45:00 2014-09-04 09:30:00 1.0880000
# 14 2014-09-04 09:00:00 2014-09-04 09:45:00 1.1130000
# 15 2014-09-04 09:15:00 2014-09-04 10:00:00 0.8810000
# 16 2014-09-04 09:30:00 2014-09-04 10:15:00 0.8626667
# 17 2014-09-04 09:45:00 2014-09-04 10:30:00 0.7960000
# 18 2014-09-04 10:00:00 2014-09-04 10:45:00 0.4920000
我有一个矩阵,看起来有点像这样:
Date Time Data
15000 04/09/2014 05:45:00 0.908
15001 04/09/2014 06:00:00 0.888
15002 04/09/2014 06:15:00 0.976
15003 04/09/2014 06:30:00 1.632
15004 04/09/2014 06:45:00 1.648
15005 04/09/2014 07:00:00 1.164
15006 04/09/2014 07:15:00 0.568
15007 04/09/2014 07:30:00 1.020
15008 04/09/2014 07:45:00 1.052
15009 04/09/2014 08:00:00 0.920
15010 04/09/2014 08:15:00 0.656
15011 04/09/2014 08:30:00 1.172
15012 04/09/2014 08:45:00 1.000
15013 04/09/2014 09:00:00 1.420
15014 04/09/2014 09:15:00 0.936
15015 04/09/2014 09:30:00 0.996
15016 04/09/2014 09:45:00 1.100
15017 04/09/2014 10:00:00 0.492
它包含一年的数据,每天有 96 行(从 00:00 到 23:45 间隔 15 分钟)。我的问题是,我想根据我指定的时间范围对每一天的 data
列求平均值。例如,如果我想对每天 06:00 - 08:00 的时间取平均值,在上面的代码中我应该得到日期 04/09/2014
的 1.0964
的答案。
我不知道如何使用日期和时间列作为过滤器来执行此操作,想知道是否有人可以提供帮助?
为了让事情变得更加复杂,我还想计算不同时间段内每天 45 分钟的滚动平均值,比如 04:00 - 09:00。同样,因为这是针对每一天,所以最好在矩阵中得到结果,每一行都是特定日期,然后列将代表 04:00 - 04:45
、[=15= 的滚动平均值]...
有什么想法吗?!
检查以下代码,如果有任何不清楚的地方请告诉我
data = read.table(header = T, stringsAsFactors = F, text = "Index Date Time Data
15000 04/09/2014 05:45:00 0.908
15001 04/09/2014 06:00:00 0.888
15002 04/09/2014 06:15:00 0.976
15003 04/09/2014 06:30:00 1.632
15004 04/09/2014 06:45:00 1.648
15005 04/09/2014 07:00:00 1.164
15006 04/09/2014 07:15:00 0.568
15007 04/09/2014 07:30:00 1.020
15008 04/09/2014 07:45:00 1.052
15009 04/09/2014 08:00:00 0.920
15010 04/09/2014 08:15:00 0.656
15011 04/09/2014 08:30:00 1.172
15012 04/09/2014 08:45:00 1.000
15013 04/09/2014 09:00:00 1.420
15014 04/09/2014 09:15:00 0.936
15015 04/09/2014 09:30:00 0.996
15016 04/09/2014 09:45:00 1.100
15017 04/09/2014 10:00:00 0.492")
library("magrittr")
data$parsed.timestamp = paste(data$Date, data$Time) %>% strptime(., format = "%d/%m/%Y %H:%M:%S")
# Hourly Average
desiredGroupingUnit = cut(data$parsed.timestamp, breaks = "hour") #You can use substr for that also
aggregate(data$Data, by = list(desiredGroupingUnit), FUN = mean )
# Group.1 x
# 1 2014-09-04 05:00:00 0.908
# 2 2014-09-04 06:00:00 1.286
# 3 2014-09-04 07:00:00 0.951
# 4 2014-09-04 08:00:00 0.937
# 5 2014-09-04 09:00:00 1.113
# 6 2014-09-04 10:00:00 0.492
# Moving average
getAvgBetweenTwoTimeStamps = function(data, startTime, endTime) {
avergeThoseIndcies = which(data$parsed.timestamp >= startTime & data$parsed.timestamp <= endTime)
return(mean(data$Data[avergeThoseIndcies]))
}
movingAvgWindow = 45*60 #minutes
movingAvgTimestamps = data.frame(from = data$parsed.timestamp, to = data$parsed.timestamp + movingAvgWindow)
movingAvgTimestamps$movingAvg =
apply(movingAvgTimestamps, MARGIN = 1,
FUN = function(x) getAvgBetweenTwoTimeStamps(data = data, startTime = x["from"], endTime = x["to"]))
print(movingAvgTimestamps)
# from to movingAvg
# 1 2014-09-04 05:45:00 2014-09-04 06:30:00 1.1010000
# 2 2014-09-04 06:00:00 2014-09-04 06:45:00 1.2860000
# 3 2014-09-04 06:15:00 2014-09-04 07:00:00 1.3550000
# 4 2014-09-04 06:30:00 2014-09-04 07:15:00 1.2530000
# 5 2014-09-04 06:45:00 2014-09-04 07:30:00 1.1000000
# 6 2014-09-04 07:00:00 2014-09-04 07:45:00 0.9510000
# 7 2014-09-04 07:15:00 2014-09-04 08:00:00 0.8900000
# 8 2014-09-04 07:30:00 2014-09-04 08:15:00 0.9120000
# 9 2014-09-04 07:45:00 2014-09-04 08:30:00 0.9500000
# 10 2014-09-04 08:00:00 2014-09-04 08:45:00 0.9370000
# 11 2014-09-04 08:15:00 2014-09-04 09:00:00 1.0620000
# 12 2014-09-04 08:30:00 2014-09-04 09:15:00 1.1320000
# 13 2014-09-04 08:45:00 2014-09-04 09:30:00 1.0880000
# 14 2014-09-04 09:00:00 2014-09-04 09:45:00 1.1130000
# 15 2014-09-04 09:15:00 2014-09-04 10:00:00 0.8810000
# 16 2014-09-04 09:30:00 2014-09-04 10:15:00 0.8626667
# 17 2014-09-04 09:45:00 2014-09-04 10:30:00 0.7960000
# 18 2014-09-04 10:00:00 2014-09-04 10:45:00 0.4920000