如何在指定日期范围内获取 R 中时间序列的每小时平均值?
How to get hourly average for a timeseries in R for a specified date range?
我有 2 年内多个区域 A、B、C 的每小时数据。我希望能够在每个区域的指定日期范围内获得每小时平均值。抱歉,我通读了:
How to make a great R reproducible example
但不确定如何使用 dput() 来正确表示我拥有的数据。请在下面查看我从 dput(mydata) 复制输出的示例数据:
structure(list(time = structure(c(1451606400, 1451610000, 1451613600,
1451617200, 1451620800, 1451624400, 1451628000, 1451631600, 1451635200,
1451638800), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
A = c(0.0173731172095063, 0.0175417882503753, 0.0175839560105925,
0.017499620490158, 0.0173309494492891, 0.017668291531027,
0.017836962571896, 0.017836962571896, 0.0182586401740685,
0.0234452746807901), B = c(0.0173567013800694, 0.0173567013800694,
0.0170744785934016, 0.0172155899867355, 0.0170744785934016,
0.0172155899867355, 0.0172155899867355, 0.0172861456834025,
0.0173567013800694, 0.0198261507634126), C = c(0.00791114205246669,
0.00806936489351603, 0.00806936489351603, 0.00806936489351603,
0.00806936489351603, 0.00822758773456536, 0.00854403341666403,
0.00854403341666403, 0.00854403341666403, 0.012341381601848
)), class = "data.frame", row.names = c(NA, 10L))
基本上我利用我拥有的时间数据创建了单独的列来表示年、月、日和小时。
structure(list(Year = c("2016", "2016", "2016", "2016", "2016",
"2016", "2016", "2016", "2016", "2016"), Month = c("01", "01",
"01", "01", "01", "01", "01", "01", "01", "01"), Day = c("01",
"01", "01", "01", "01", "01", "01", "01", "01", "01"), hour = c("00",
"01", "02", "03", "04", "05", "06", "07", "08", "09"), timedata = structure(c(1451606400,
1451610000, 1451613600, 1451617200, 1451620800, 1451624400, 1451628000,
1451631600, 1451635200, 1451638800), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), class = "data.frame", row.names = c(NA, 10L
))
我希望根据指定的日期范围获取 24 小时内填充的平均值,如下所示。我之所以将时间分解为特定的年、月、日和小时列是为了执行 group_by() 之类的操作,但我有几个问题。
我希望获得指定日期范围内的平均值(例如,没有周末的 1 月到 3 月)。
我期望的最终输出矩阵应该是一个 25 x 4 矩阵。下面 0:00 小时的值 x 将是 1 月至 3 月工作日小时 0:00 的平均值A区
time A B C
0:00 x
1:00
2:00
3:00
谢谢。
你可以试试这个-
library(dplyr)
library(lubridate)
df %>%
mutate(month = month(time),
hour = hour(time)) %>%
filter(format(time, '%u') %in% 1:5, month %in% 1:3) %>%
group_by(hour) %>%
summarise(across(A:C, mean, na.rm =TRUE))
format(time, '%u') %in% 1:5
将仅保留工作日(周一至周五),month %in% 1:3
将仅保留一月至三月。
我有 2 年内多个区域 A、B、C 的每小时数据。我希望能够在每个区域的指定日期范围内获得每小时平均值。抱歉,我通读了: How to make a great R reproducible example 但不确定如何使用 dput() 来正确表示我拥有的数据。请在下面查看我从 dput(mydata) 复制输出的示例数据:
structure(list(time = structure(c(1451606400, 1451610000, 1451613600,
1451617200, 1451620800, 1451624400, 1451628000, 1451631600, 1451635200,
1451638800), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
A = c(0.0173731172095063, 0.0175417882503753, 0.0175839560105925,
0.017499620490158, 0.0173309494492891, 0.017668291531027,
0.017836962571896, 0.017836962571896, 0.0182586401740685,
0.0234452746807901), B = c(0.0173567013800694, 0.0173567013800694,
0.0170744785934016, 0.0172155899867355, 0.0170744785934016,
0.0172155899867355, 0.0172155899867355, 0.0172861456834025,
0.0173567013800694, 0.0198261507634126), C = c(0.00791114205246669,
0.00806936489351603, 0.00806936489351603, 0.00806936489351603,
0.00806936489351603, 0.00822758773456536, 0.00854403341666403,
0.00854403341666403, 0.00854403341666403, 0.012341381601848
)), class = "data.frame", row.names = c(NA, 10L))
基本上我利用我拥有的时间数据创建了单独的列来表示年、月、日和小时。
structure(list(Year = c("2016", "2016", "2016", "2016", "2016",
"2016", "2016", "2016", "2016", "2016"), Month = c("01", "01",
"01", "01", "01", "01", "01", "01", "01", "01"), Day = c("01",
"01", "01", "01", "01", "01", "01", "01", "01", "01"), hour = c("00",
"01", "02", "03", "04", "05", "06", "07", "08", "09"), timedata = structure(c(1451606400,
1451610000, 1451613600, 1451617200, 1451620800, 1451624400, 1451628000,
1451631600, 1451635200, 1451638800), class = c("POSIXct", "POSIXt"
), tzone = "UTC")), class = "data.frame", row.names = c(NA, 10L
))
我希望根据指定的日期范围获取 24 小时内填充的平均值,如下所示。我之所以将时间分解为特定的年、月、日和小时列是为了执行 group_by() 之类的操作,但我有几个问题。
我希望获得指定日期范围内的平均值(例如,没有周末的 1 月到 3 月)。
我期望的最终输出矩阵应该是一个 25 x 4 矩阵。下面 0:00 小时的值 x 将是 1 月至 3 月工作日小时 0:00 的平均值A区
time A B C
0:00 x
1:00
2:00
3:00
谢谢。
你可以试试这个-
library(dplyr)
library(lubridate)
df %>%
mutate(month = month(time),
hour = hour(time)) %>%
filter(format(time, '%u') %in% 1:5, month %in% 1:3) %>%
group_by(hour) %>%
summarise(across(A:C, mean, na.rm =TRUE))
format(time, '%u') %in% 1:5
将仅保留工作日(周一至周五),month %in% 1:3
将仅保留一月至三月。