plyr + simpleboot:概率向量中的 NA
plyr + simpleboot: NA in probability vector
我正在使用 simpleboot
程序包 (https://cran.r-project.org/web/packages/simpleboot/index.html) 来获取置信区间。
这是我的功能:
lb_weighted_median_dplyr <- function(x,v) {
set.seed(1234)
b <- one.boot(x, weights = v, FUN = function(x,w) matrixStats::weightedMedian(x, w = v, na.rm = TRUE), R = 100, student = FALSE)
round(perc(b, 0.025), 0)
}
函数的作用是计算当我运行
时置信区间的下界
ddply(wage_by_gender_2015, .(sex,region), summarise, FUN = lb_weighted_median_dplyr(wage, exp_region))
其中 wage
是一个数字列,exp_region
是另一个包含权重的数字列。
我没有某些地区的数据,因此该功能在某些地区失败,returns
Error in eval(substitute(expr), envir, enclos) : NA in probability vector
如何绕过该错误并获得 NA 作为无数据区域的下限?
一种dplyr
等效方法,returns NA in probability vector
也是
grouped <- group_by(wage_by_gender_2015, sex, region)
dplyr::summarise(grouped, FUN = lb_weighted_median_dplyr(wage, exp_region))
这里的相关数据样本:http://users.dcc.uchile.cl/~mvargas/casen/wage_by_gender_2015.RData
wage_by_gender_2015 <- data.frame(sex = rep(c("male", "female"),100),
region = rep(c("north", "south", "east",
"west"), 50),
exp_region = abs(rnorm(100)),
wage = abs(rnorm(100))
)
wage_by_gender_2015$exp_region[10] <- NA
ddply(wage_by_gender_2015, .(sex,region), summarise, FUN = lb_weighted_median_dplyr(wage, exp_region))
Error in sample.int(length(x), replace = TRUE, ...) : NA in probability vector
# impute
wage_by_gender_2015$exp_region <- RRF::na.roughfix(wage_by_gender_2015$exp_region)
ddply(wage_by_gender_2015, .(sex,region), summarise, FUN = lb_weighted_median_dplyr(wage, exp_region))
sex region FUN
1 female south 0
2 female west 0
3 male east 1
4 male north 0
如评论中所述,我会使用您的示例数据,但它丢失了 sex
。
我正在使用 simpleboot
程序包 (https://cran.r-project.org/web/packages/simpleboot/index.html) 来获取置信区间。
这是我的功能:
lb_weighted_median_dplyr <- function(x,v) {
set.seed(1234)
b <- one.boot(x, weights = v, FUN = function(x,w) matrixStats::weightedMedian(x, w = v, na.rm = TRUE), R = 100, student = FALSE)
round(perc(b, 0.025), 0)
}
函数的作用是计算当我运行
时置信区间的下界ddply(wage_by_gender_2015, .(sex,region), summarise, FUN = lb_weighted_median_dplyr(wage, exp_region))
其中 wage
是一个数字列,exp_region
是另一个包含权重的数字列。
我没有某些地区的数据,因此该功能在某些地区失败,returns
Error in eval(substitute(expr), envir, enclos) : NA in probability vector
如何绕过该错误并获得 NA 作为无数据区域的下限?
一种dplyr
等效方法,returns NA in probability vector
也是
grouped <- group_by(wage_by_gender_2015, sex, region)
dplyr::summarise(grouped, FUN = lb_weighted_median_dplyr(wage, exp_region))
这里的相关数据样本:http://users.dcc.uchile.cl/~mvargas/casen/wage_by_gender_2015.RData
wage_by_gender_2015 <- data.frame(sex = rep(c("male", "female"),100),
region = rep(c("north", "south", "east",
"west"), 50),
exp_region = abs(rnorm(100)),
wage = abs(rnorm(100))
)
wage_by_gender_2015$exp_region[10] <- NA
ddply(wage_by_gender_2015, .(sex,region), summarise, FUN = lb_weighted_median_dplyr(wage, exp_region))
Error in sample.int(length(x), replace = TRUE, ...) : NA in probability vector
# impute
wage_by_gender_2015$exp_region <- RRF::na.roughfix(wage_by_gender_2015$exp_region)
ddply(wage_by_gender_2015, .(sex,region), summarise, FUN = lb_weighted_median_dplyr(wage, exp_region))
sex region FUN 1 female south 0 2 female west 0 3 male east 1 4 male north 0
如评论中所述,我会使用您的示例数据,但它丢失了 sex
。