使用R计算中位数而不复制元素
Use R to calculate median without replicating elements
我的频率分布很大。我想计算中位数和四分位数,但 R 抱怨。以下是适用于小数字的方法:
> TABLE <- data.frame(DATA = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19), F = c(48,0,192,1152,5664,23040,77952,214272,423984,558720,267840,0,0,0,0,0,0,0,0))
> summary(rep(TABLE$DAT,TABLE$F))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 9.000 10.000 9.397 10.000 11.000
这是我得到的大数字:
> TABLE <- data.frame(DATA = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19), F = c(240,0,1200,9600,69600,470400,2992800,17859840,98312880,489292800,2164619760,8325820800,26865302400,68711068800,128967422400,153763315200,96770419200,26824089600,2395008000))
> summary(rep(TABLE$DAT,TABLE$F))
Error in rep(TABLE$DAT, TABLE$F) : invalid 'times' argument
In addition: Warning message:
In summary(rep(TABLE$DAT, TABLE$F)) :
NAs introduced by coercion to integer range
这个错误并不让我感到意外,因为使用 "rep" 我想创建一个巨大的向量。但我不知道,如何避免这种情况并计算中位数和四分位数。
与其试图复制那个怪物来使用 summary()
,不如得到 "weighted quantiles"。
This post has a formula。
但与大多数事情一样,一旦你知道了正确的条款,你就可以找到一个包裹
那已经完成了工作!
#install.packages("Hmisc")
TABLE <- data.frame(DATA = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19), F = c(240,0,1200,9600,69600,470400,2992800,17859840,98312880,489292800,2164619760,8325820800,26865302400,68711068800,128967422400,153763315200,96770419200,26824089600,2395008000))
Hmisc::wtd.quantile(TABLE$DATA, probs = c(0.25, 0.5, 0.75), weight = TABLE$F)
#> 25% 50% 75%
#> 15 16 16
由 reprex package (v0.2.0) 创建于 2018-04-06。
我的频率分布很大。我想计算中位数和四分位数,但 R 抱怨。以下是适用于小数字的方法:
> TABLE <- data.frame(DATA = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19), F = c(48,0,192,1152,5664,23040,77952,214272,423984,558720,267840,0,0,0,0,0,0,0,0))
> summary(rep(TABLE$DAT,TABLE$F))
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 9.000 10.000 9.397 10.000 11.000
这是我得到的大数字:
> TABLE <- data.frame(DATA = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19), F = c(240,0,1200,9600,69600,470400,2992800,17859840,98312880,489292800,2164619760,8325820800,26865302400,68711068800,128967422400,153763315200,96770419200,26824089600,2395008000))
> summary(rep(TABLE$DAT,TABLE$F))
Error in rep(TABLE$DAT, TABLE$F) : invalid 'times' argument
In addition: Warning message:
In summary(rep(TABLE$DAT, TABLE$F)) :
NAs introduced by coercion to integer range
这个错误并不让我感到意外,因为使用 "rep" 我想创建一个巨大的向量。但我不知道,如何避免这种情况并计算中位数和四分位数。
与其试图复制那个怪物来使用 summary()
,不如得到 "weighted quantiles"。
This post has a formula。
但与大多数事情一样,一旦你知道了正确的条款,你就可以找到一个包裹
那已经完成了工作!
#install.packages("Hmisc")
TABLE <- data.frame(DATA = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19), F = c(240,0,1200,9600,69600,470400,2992800,17859840,98312880,489292800,2164619760,8325820800,26865302400,68711068800,128967422400,153763315200,96770419200,26824089600,2395008000))
Hmisc::wtd.quantile(TABLE$DATA, probs = c(0.25, 0.5, 0.75), weight = TABLE$F)
#> 25% 50% 75%
#> 15 16 16
由 reprex package (v0.2.0) 创建于 2018-04-06。