使用 ANESRake 加权(倾斜)数据时缺少级别
Missing levels when weightig (raking) data using ANESRake
我有一个调查数据集和一些引用:
人口报价是:
(1 = up to 29 years 0,00%)
2 = 30 to 39 years 18,10%
3 = 40 to 49 years 28,77%
4 = 50 to 59 years 33,11%
5 = 60 and more years 20,01%
在数据集中,我要权重类别 5 是缺失的。以下是数据集中变量的统计信息:
2 = 32,33%
3 = 36,56%
4 = 31,12%
如果我执行清扫,我会收到以下错误:
library(anesrake)
r = anesrake(list_weights,
d,
verbose = FALSE,
caseid = d$RESPID,
maxit = 1500,
cap = 5,
choosemethod = "max",
type = "nolim")
Error in rakeonvar.default(mat[, i], inputter[[i]], weightvec) : variables must be coded continuously from 1 to n with no missing values
知道如何处理数据中缺失的级别吗?
这是引号的输出
list(Rec_Age = c(`2` = 0.181, `3` = 0.2877, `4` = 0.3311))
和少量数据输入
structure(list(RESPID = structure(c(459, 311, 223, 60, 613, 495,
300, 273, 78, 170, 217, 61, 175, 619, 270, 218, 453, 492, 23,
65, 33, 113, 532, 26, 119, 49, 208, 102, 200, 165, 435, 298,
593, 220, 111, 53, 494, 271, 305, 420, 323, 607, 105, 19, 426,
171, 330, 201, 332, 277), label = "RESPID - Respondent ID", format.spss = "F10.0", display_width = 0L),
Rec_Age = structure(c(4, 2, 4, 3, 4, 4, 4, 3, 2, 2, 3, 2,
3, 4, 4, 2, 4, 4, 2, 3, 2, 2, 2, 3, 3, 2, 2, 2, 2, 3, 2,
3, 2, 3, 4, 3, 4, 3, 2, 3, 3, 3, 4, 4, 4, 2, 2, 3, 4, 3), label = "Rec_Age - Recode Age")), row.names = c(NA,
-50L), class = "data.frame")
@Yuriy Saraykin
你是对的,现在没有错误,但如果我使用你的代码,所有权重在倾斜后都为 1。所以一定是出了什么问题。
我不明白这是为什么。如果我像你一样使用所有级别的列表,我会收到此错误(我之前尝试过)。
Error in rakeonvar.default(mat[, i], inputter[[i]], weightvec) : you cannot rake any variable category to 0 or a negative number
你的列表和我的列表有什么区别(即使你的代码没有提供所需的结果)?
您的列表:
your_list
[[1]]
1 2 3 4 5
0.0000000 0.1810181 0.2877288 0.3311331 0.2001200
dput(your_list)
list(Rec_Age = c(`1` = 0, `2` = 0.181, `3` = 0.2877, `4` = 0.3311,
`5` = 0.2001))
我的名单:
My_list
my_list:
$Rec_Age
1 2 3 4 5
0.0000 0.1810 0.2877 0.3311 0.2001
dput(my_list)
list(Rec_Age = c(`1` = 0, `2` = 0.181, `3` = 0.2877, `4` = 0.3311, `5` =
0.2001))
我的列表生成如下:
REC_age = c(0, 0.181, 0.2877, 0.3311, 0.2001)
names(REC_age) = c(1, 2, 3, 4, 5)
像这样尝试。在我看来,您可以在样本中包含有关人口的信息。
这是一篇关于该主题的好文章。
https://www.r-bloggers.com/survey-raking-an-illustration/
library(anesrake)
library(weights)
library(tidiverse)
d <- d %>% mutate(Rec_Age = as.factor(Rec_Age))
population <- data.frame(Rec_Age = c("2", "3", "4"),
fraction = c(0.181, 0.2877, 0.3311))
list_weights <- with(population,
list(Rec_Age = wpct(Rec_Age, fraction)))
r <- anesrake(list_weights,
d,
caseid = d$RESPID,
maxit = 1500,
cap = 5,
choosemethod = "max",
type = "nolim")
我有一个调查数据集和一些引用:
人口报价是:
(1 = up to 29 years 0,00%)
2 = 30 to 39 years 18,10%
3 = 40 to 49 years 28,77%
4 = 50 to 59 years 33,11%
5 = 60 and more years 20,01%
在数据集中,我要权重类别 5 是缺失的。以下是数据集中变量的统计信息:
2 = 32,33%
3 = 36,56%
4 = 31,12%
如果我执行清扫,我会收到以下错误:
library(anesrake)
r = anesrake(list_weights,
d,
verbose = FALSE,
caseid = d$RESPID,
maxit = 1500,
cap = 5,
choosemethod = "max",
type = "nolim")
Error in rakeonvar.default(mat[, i], inputter[[i]], weightvec) : variables must be coded continuously from 1 to n with no missing values
知道如何处理数据中缺失的级别吗?
这是引号的输出
list(Rec_Age = c(`2` = 0.181, `3` = 0.2877, `4` = 0.3311))
和少量数据输入
structure(list(RESPID = structure(c(459, 311, 223, 60, 613, 495,
300, 273, 78, 170, 217, 61, 175, 619, 270, 218, 453, 492, 23,
65, 33, 113, 532, 26, 119, 49, 208, 102, 200, 165, 435, 298,
593, 220, 111, 53, 494, 271, 305, 420, 323, 607, 105, 19, 426,
171, 330, 201, 332, 277), label = "RESPID - Respondent ID", format.spss = "F10.0", display_width = 0L),
Rec_Age = structure(c(4, 2, 4, 3, 4, 4, 4, 3, 2, 2, 3, 2,
3, 4, 4, 2, 4, 4, 2, 3, 2, 2, 2, 3, 3, 2, 2, 2, 2, 3, 2,
3, 2, 3, 4, 3, 4, 3, 2, 3, 3, 3, 4, 4, 4, 2, 2, 3, 4, 3), label = "Rec_Age - Recode Age")), row.names = c(NA,
-50L), class = "data.frame")
@Yuriy Saraykin
你是对的,现在没有错误,但如果我使用你的代码,所有权重在倾斜后都为 1。所以一定是出了什么问题。
我不明白这是为什么。如果我像你一样使用所有级别的列表,我会收到此错误(我之前尝试过)。
Error in rakeonvar.default(mat[, i], inputter[[i]], weightvec) : you cannot rake any variable category to 0 or a negative number
你的列表和我的列表有什么区别(即使你的代码没有提供所需的结果)?
您的列表:
your_list
[[1]]
1 2 3 4 5
0.0000000 0.1810181 0.2877288 0.3311331 0.2001200
dput(your_list)
list(Rec_Age = c(`1` = 0, `2` = 0.181, `3` = 0.2877, `4` = 0.3311,
`5` = 0.2001))
我的名单:
My_list
my_list:
$Rec_Age
1 2 3 4 5
0.0000 0.1810 0.2877 0.3311 0.2001
dput(my_list)
list(Rec_Age = c(`1` = 0, `2` = 0.181, `3` = 0.2877, `4` = 0.3311, `5` =
0.2001))
我的列表生成如下:
REC_age = c(0, 0.181, 0.2877, 0.3311, 0.2001)
names(REC_age) = c(1, 2, 3, 4, 5)
像这样尝试。在我看来,您可以在样本中包含有关人口的信息。 这是一篇关于该主题的好文章。 https://www.r-bloggers.com/survey-raking-an-illustration/
library(anesrake)
library(weights)
library(tidiverse)
d <- d %>% mutate(Rec_Age = as.factor(Rec_Age))
population <- data.frame(Rec_Age = c("2", "3", "4"),
fraction = c(0.181, 0.2877, 0.3311))
list_weights <- with(population,
list(Rec_Age = wpct(Rec_Age, fraction)))
r <- anesrake(list_weights,
d,
caseid = d$RESPID,
maxit = 1500,
cap = 5,
choosemethod = "max",
type = "nolim")