将标准误差列添加到我的数据集中,以便绘制误差线
Adding a standard error column to my data set so error bars can be plotted
Data <- data.frame(id, consumption, Day, Hour)
#The data is a large time series data set with thousands of valued per household id.
#eg.
consumption <- c(99, 119, 130, 110, 109, 118) etc.
#Hour and Day were calculated from the Date Time of the dataset.
我使用 ggplot2 创建了两个单独的折线图,分别表示一系列家庭的总平均能耗和下午 4 点到 8 点之间的平均能耗。我希望添加与每个值的标准误差相对应的特定值(不是常数)误差线。我不确定如何将标准错误列添加到与每个单独值对应的数据集中。要是能用管道就好了!
我在网上寻找不同的方法来计算单个标准误差并添加一个列,但是没有任何效果。可能是因为我不是在绘制原始数据,而是在绘制已汇总的数据(总和和平均值)。两个图 1) 和 2) 对于相同的日期会有不同的误差线。我在最后附上了一张图片,说明情节应该是什么样子。
这些是我的情节:
1) 总体每日平均消费
Data %>%
group_by(id, Day)%>%
#id is household identification
summarise(DailyCons = sum(consumption))%>%
#Sum for total daily consumption per household
group_by(Day)%>%
summarise(MeanDailyCons = mean(DailyCons))%>%
#Find mean daily consumption for all households
ggplot()+
geom_line(aes(x= Day, y= MeanDailyCons))
2) 16:00-20:00
之间的每日平均值
Data %>%
mutate(TimeInt = ifelse(Hour %in% c(16, 17, 18, 19, 20), Hour, NA))%>%
#removing Hours outside of range 16-20
group_by(id, TimeInt, Day) %>%
na.omit(TimeInt)%>%
summarise(sumPeakCons = sum(consumption)) %>%
#sum for total consumption for each hour in interval for each house
group_by(bmg_id, Day) %>%
summarise(PeakCons = sum(sumPeakCons)) %>%
#sum for total daily consumption in interval for each house
group_by(Day) %>%
summarise(DailyPeakCons = mean(PeakCons)) %>%
# Daily mean consumption for all houses
ggplot()+
geom_line(aes(x= Day, y= DailyPeakCons))
包含一张图片以显示所需的结果。
https://i.stack.imgur.com/WDT8Z.png
你说得对,按天汇总数据后不能加上标准误。任何尝试的函数只会收到一个平均值和一个日期时间,不足以产生错误。从原始数据汇总时必须添加标准误差。
在您的汇总语句中再添加一列:
summarise(DailyPeakCons = mean(PeakCons),DailyPeakConsErr = sd(PeakCons)) %>%
这将给出每天峰值消耗的标准差。
Data <- data.frame(id, consumption, Day, Hour)
#The data is a large time series data set with thousands of valued per household id.
#eg.
consumption <- c(99, 119, 130, 110, 109, 118) etc.
#Hour and Day were calculated from the Date Time of the dataset.
我使用 ggplot2 创建了两个单独的折线图,分别表示一系列家庭的总平均能耗和下午 4 点到 8 点之间的平均能耗。我希望添加与每个值的标准误差相对应的特定值(不是常数)误差线。我不确定如何将标准错误列添加到与每个单独值对应的数据集中。要是能用管道就好了!
我在网上寻找不同的方法来计算单个标准误差并添加一个列,但是没有任何效果。可能是因为我不是在绘制原始数据,而是在绘制已汇总的数据(总和和平均值)。两个图 1) 和 2) 对于相同的日期会有不同的误差线。我在最后附上了一张图片,说明情节应该是什么样子。
这些是我的情节: 1) 总体每日平均消费
Data %>%
group_by(id, Day)%>%
#id is household identification
summarise(DailyCons = sum(consumption))%>%
#Sum for total daily consumption per household
group_by(Day)%>%
summarise(MeanDailyCons = mean(DailyCons))%>%
#Find mean daily consumption for all households
ggplot()+
geom_line(aes(x= Day, y= MeanDailyCons))
2) 16:00-20:00
之间的每日平均值Data %>%
mutate(TimeInt = ifelse(Hour %in% c(16, 17, 18, 19, 20), Hour, NA))%>%
#removing Hours outside of range 16-20
group_by(id, TimeInt, Day) %>%
na.omit(TimeInt)%>%
summarise(sumPeakCons = sum(consumption)) %>%
#sum for total consumption for each hour in interval for each house
group_by(bmg_id, Day) %>%
summarise(PeakCons = sum(sumPeakCons)) %>%
#sum for total daily consumption in interval for each house
group_by(Day) %>%
summarise(DailyPeakCons = mean(PeakCons)) %>%
# Daily mean consumption for all houses
ggplot()+
geom_line(aes(x= Day, y= DailyPeakCons))
包含一张图片以显示所需的结果。
https://i.stack.imgur.com/WDT8Z.png
你说得对,按天汇总数据后不能加上标准误。任何尝试的函数只会收到一个平均值和一个日期时间,不足以产生错误。从原始数据汇总时必须添加标准误差。
在您的汇总语句中再添加一列:
summarise(DailyPeakCons = mean(PeakCons),DailyPeakConsErr = sd(PeakCons)) %>%
这将给出每天峰值消耗的标准差。