lubridate - 按小时分组值并计算平均值
lubridate - group value by hour and calculate mean
我得到了以下data.frame:
df = read.csv(text = 'date, no, no2, nox,
2015-10-16 00:00:00, 1.10979, 14.50249, 16.20413,
2015-10-16 01:00:00, 1.73032, 13.60122, 16.25434,
2015-10-17 00:00:00, 1.30592, 11.20056, 13.20294,
2015-10-17 01:00:00, 2.05711, 11.34973, 14.50392,
2015-10-18 00:00:00, 4.14603, 16.79844, 23.15559,
2015-10-18 01:00:00, 7.73731, 24.74488, 36.60860')
df = df[,-c(5)]
我需要计算所有变量在三天内每小时的平均值。
我试过了,但没用:
data_0 = df[hours(df$date) %in% 0,]
data_1 = df[hours(df$date) %in% 1,]
.....
有什么建议吗?
输出应该是一个数据框,其中对于每个变量,我有三天时间范围内每个小时的平均值。
> class(df$date)
[1] "POSIXlt" "POSIXt"
由于您的数据集未以可重现的格式提供,因此我使用的是图书馆 (openair) 中的数据集。
library(data.table)
data(mydata, package = "openair")
melt(setDT(mydata), id.var = "date")[, .(
avg = mean(value, na.rm = T)
), by = .(hour(date), variable)]
#1 create column with hour
df$hour <- as.POSIXlt(df$date)$hour
#2 calculate no (col 2) mean for each group of hours
data_no = aggregate(df$no, by=list(hour=df$hour), FUN=mean)
#3 rename cols
colnames(data_no) = c('hour', 'mean')
对所有感兴趣的变量重复第 2 点和第 3 点。
这是 tidyverse 示例,应该可以。这种方式重复是非常少的。
library(lubridate)
library(tidyverse)
df = read.csv(text = 'date, no, no2, nox,
2015-10-16 00:00:00, 1.10979, 14.50249, 16.20413,
2015-10-16 01:00:00, 1.73032, 13.60122, 16.25434,
2015-10-17 00:00:00, 1.30592, 11.20056, 13.20294,
2015-10-17 01:00:00, 2.05711, 11.34973, 14.50392,
2015-10-18 00:00:00, 4.14603, 16.79844, 23.15559,
2015-10-18 01:00:00, 7.73731, 24.74488, 36.60860')
df = df[,-c(5)]
df %>%
mutate(date = ymd_hms(date),
hour = hour(date)) %>%
group_by(hour) %>%
summarise(mean_no = mean(no),
mean_no2 = mean(no2))
我得到了以下data.frame:
df = read.csv(text = 'date, no, no2, nox,
2015-10-16 00:00:00, 1.10979, 14.50249, 16.20413,
2015-10-16 01:00:00, 1.73032, 13.60122, 16.25434,
2015-10-17 00:00:00, 1.30592, 11.20056, 13.20294,
2015-10-17 01:00:00, 2.05711, 11.34973, 14.50392,
2015-10-18 00:00:00, 4.14603, 16.79844, 23.15559,
2015-10-18 01:00:00, 7.73731, 24.74488, 36.60860')
df = df[,-c(5)]
我需要计算所有变量在三天内每小时的平均值。
我试过了,但没用:
data_0 = df[hours(df$date) %in% 0,]
data_1 = df[hours(df$date) %in% 1,]
.....
有什么建议吗?
输出应该是一个数据框,其中对于每个变量,我有三天时间范围内每个小时的平均值。
> class(df$date)
[1] "POSIXlt" "POSIXt"
由于您的数据集未以可重现的格式提供,因此我使用的是图书馆 (openair) 中的数据集。
library(data.table)
data(mydata, package = "openair")
melt(setDT(mydata), id.var = "date")[, .(
avg = mean(value, na.rm = T)
), by = .(hour(date), variable)]
#1 create column with hour
df$hour <- as.POSIXlt(df$date)$hour
#2 calculate no (col 2) mean for each group of hours
data_no = aggregate(df$no, by=list(hour=df$hour), FUN=mean)
#3 rename cols
colnames(data_no) = c('hour', 'mean')
对所有感兴趣的变量重复第 2 点和第 3 点。
这是 tidyverse 示例,应该可以。这种方式重复是非常少的。
library(lubridate)
library(tidyverse)
df = read.csv(text = 'date, no, no2, nox,
2015-10-16 00:00:00, 1.10979, 14.50249, 16.20413,
2015-10-16 01:00:00, 1.73032, 13.60122, 16.25434,
2015-10-17 00:00:00, 1.30592, 11.20056, 13.20294,
2015-10-17 01:00:00, 2.05711, 11.34973, 14.50392,
2015-10-18 00:00:00, 4.14603, 16.79844, 23.15559,
2015-10-18 01:00:00, 7.73731, 24.74488, 36.60860')
df = df[,-c(5)]
df %>%
mutate(date = ymd_hms(date),
hour = hour(date)) %>%
group_by(hour) %>%
summarise(mean_no = mean(no),
mean_no2 = mean(no2))