如何使用 R 中的 data.table 包计算汇总统计数据(标准误差、上下置信区间)
How to Calculate Summary Statistics (Standard Error, and Upper and Lower Confidence intervals) using the package data.table in R
问题
我有一个名为 FID 的数据框(见下文),我正在尝试使用包 data.table 来总结我的数据。我想通过以下方式总结我的数据:-
所需的汇总数据框
- 月份
- 3 年来每月 FID 的总频率
- 3 年来每月 FID 的平均频率
- 3 年内每月 FID 的标准差
- 3 年内每月 FID 的标准误差
- 3 年内每月较低的置信水平
- 3 年内每个月的置信水平上限
我可以单独执行其中一些过程(见下文),但我想将上述 所需数据框列表(上方) 中所述的所有信息组合在一起合一 table.
我已在 Stack Overflow 页面和其他 data.table 教程中广泛阅读,但我找不到任何有关如何计算标准误差以及上限和下限的信息使用包 data.table 的置信区间。有人知道怎么做吗?
##Summary Statistics table of FID per month over 3 years
library(data.table)
##Produce a data.table object
FID.Table<-data.table(FID)
##R-code
Mean.FID<-FID_Table[, .(FID.Freq=sum(FID),
mean = mean(FID),
sd=sd(FID),
median=median(FID)),
by = .(Month)]
###Summary Statistics table
Month FID.Freq mean sd median
1: January 165 55.000000 10.535654 56
2: February 182 60.666667 29.737743 65
3: March 179 59.666667 33.291641 43
4: April 104 34.666667 16.862186 27
5: May 124 41.333333 49.571497 20
6: June 10 3.333333 5.773503 0
7: July 15 5.000000 4.358899 7
8: August 133 44.333333 21.007935 45
9: September 97 32.333333 21.548395 34
10: October 82 27.333333 13.051181 26
11: November 75 25.000000 19.000000 25
12: December 102 34.000000 4.582576 33
数据框:FID
structure(list(Year = c(2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 2016L,
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L), Month = structure(c(5L, 4L, 8L, 1L, 9L,
7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L,
12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L,
10L, 3L), .Label = c("April", "August", "December", "February",
"January", "July", "June", "March", "May", "November", "October",
"September"), class = "factor"), FID = c(65L, 88L, 43L, 54L,
98L, 0L, 0L, 23L, 10L, 15L, 6L, 33L, 56L, 29L, 98L, 23L, 6L,
10L, 7L, 65L, 53L, 41L, 25L, 30L, 44L, 65L, 38L, 27L, 20L, 0L,
8L, 45L, 34L, 26L, 44L, 39L)), class = "data.frame", row.names = c(NA,
-36L))
假设您希望每个月的行数作为标准误差的分母(即 .N
),那么您可以使用它来创建 95% ci 的(即 * 1.96
)。或者,如果您有缺失数据,您可能希望使用 sum(!is.na(FID.Freq))
而不是 .N
。简而言之,只需计算每个月的标准误差,然后稍后添加 ci 作为列:
library(data.table)
setDT(FID)
Mean.FID = FID[, .(FID.Freq=sum(FID),
mean = mean(FID),
sd=sd(FID),
se=sd(FID) / sqrt(.N),
median=median(FID)), by = Month]
Mean.FID[, `:=`(lo_ci = mean - se * 1.96, up_ci = mean + se * 1.96)]
Mean.FID
Month FID.Freq mean sd se median lo_ci up_ci
1: January 165 55.000000 10.535654 6.082763 56 43.0777854 66.922215
2: February 182 60.666667 29.737743 17.169094 65 27.0152431 94.318090
3: March 179 59.666667 33.291641 19.220938 43 21.9936289 97.339704
4: April 104 34.666667 16.862186 9.735388 27 15.5853064 53.748027
5: May 124 41.333333 49.571497 28.620117 20 -14.7620965 97.428763
6: June 10 3.333333 5.773503 3.333333 0 -3.2000000 9.866667
7: July 15 5.000000 4.358899 2.516611 7 0.0674415 9.932558
8: August 133 44.333333 21.007935 12.128937 45 20.5606169 68.106050
9: September 97 32.333333 21.548395 12.440972 34 7.9490287 56.717638
10: October 82 27.333333 13.051181 7.535103 26 12.5645314 42.102135
11: November 75 25.000000 19.000000 10.969655 25 3.4994760 46.500524
12: December 102 34.000000 4.582576 2.645751 33 28.8143274 39.185673
问题
我有一个名为 FID 的数据框(见下文),我正在尝试使用包 data.table 来总结我的数据。我想通过以下方式总结我的数据:-
所需的汇总数据框
- 月份
- 3 年来每月 FID 的总频率
- 3 年来每月 FID 的平均频率
- 3 年内每月 FID 的标准差
- 3 年内每月 FID 的标准误差
- 3 年内每月较低的置信水平
- 3 年内每个月的置信水平上限
我可以单独执行其中一些过程(见下文),但我想将上述 所需数据框列表(上方) 中所述的所有信息组合在一起合一 table.
我已在 Stack Overflow 页面和其他 data.table 教程中广泛阅读,但我找不到任何有关如何计算标准误差以及上限和下限的信息使用包 data.table 的置信区间。有人知道怎么做吗?
##Summary Statistics table of FID per month over 3 years
library(data.table)
##Produce a data.table object
FID.Table<-data.table(FID)
##R-code
Mean.FID<-FID_Table[, .(FID.Freq=sum(FID),
mean = mean(FID),
sd=sd(FID),
median=median(FID)),
by = .(Month)]
###Summary Statistics table
Month FID.Freq mean sd median
1: January 165 55.000000 10.535654 56
2: February 182 60.666667 29.737743 65
3: March 179 59.666667 33.291641 43
4: April 104 34.666667 16.862186 27
5: May 124 41.333333 49.571497 20
6: June 10 3.333333 5.773503 0
7: July 15 5.000000 4.358899 7
8: August 133 44.333333 21.007935 45
9: September 97 32.333333 21.548395 34
10: October 82 27.333333 13.051181 26
11: November 75 25.000000 19.000000 25
12: December 102 34.000000 4.582576 33
数据框:FID
structure(list(Year = c(2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2016L, 2016L, 2016L,
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L), Month = structure(c(5L, 4L, 8L, 1L, 9L,
7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L,
12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L,
10L, 3L), .Label = c("April", "August", "December", "February",
"January", "July", "June", "March", "May", "November", "October",
"September"), class = "factor"), FID = c(65L, 88L, 43L, 54L,
98L, 0L, 0L, 23L, 10L, 15L, 6L, 33L, 56L, 29L, 98L, 23L, 6L,
10L, 7L, 65L, 53L, 41L, 25L, 30L, 44L, 65L, 38L, 27L, 20L, 0L,
8L, 45L, 34L, 26L, 44L, 39L)), class = "data.frame", row.names = c(NA,
-36L))
假设您希望每个月的行数作为标准误差的分母(即 .N
),那么您可以使用它来创建 95% ci 的(即 * 1.96
)。或者,如果您有缺失数据,您可能希望使用 sum(!is.na(FID.Freq))
而不是 .N
。简而言之,只需计算每个月的标准误差,然后稍后添加 ci 作为列:
library(data.table)
setDT(FID)
Mean.FID = FID[, .(FID.Freq=sum(FID),
mean = mean(FID),
sd=sd(FID),
se=sd(FID) / sqrt(.N),
median=median(FID)), by = Month]
Mean.FID[, `:=`(lo_ci = mean - se * 1.96, up_ci = mean + se * 1.96)]
Mean.FID
Month FID.Freq mean sd se median lo_ci up_ci
1: January 165 55.000000 10.535654 6.082763 56 43.0777854 66.922215
2: February 182 60.666667 29.737743 17.169094 65 27.0152431 94.318090
3: March 179 59.666667 33.291641 19.220938 43 21.9936289 97.339704
4: April 104 34.666667 16.862186 9.735388 27 15.5853064 53.748027
5: May 124 41.333333 49.571497 28.620117 20 -14.7620965 97.428763
6: June 10 3.333333 5.773503 3.333333 0 -3.2000000 9.866667
7: July 15 5.000000 4.358899 2.516611 7 0.0674415 9.932558
8: August 133 44.333333 21.007935 12.128937 45 20.5606169 68.106050
9: September 97 32.333333 21.548395 12.440972 34 7.9490287 56.717638
10: October 82 27.333333 13.051181 7.535103 26 12.5645314 42.102135
11: November 75 25.000000 19.000000 10.969655 25 3.4994760 46.500524
12: December 102 34.000000 4.582576 2.645751 33 28.8143274 39.185673