来自 lodown 包的 SCF 数据问题

Question

我在用lodown包分析SCF的时候发现了一个很奇怪的问题。这组黑人的数据肯定有问题，年龄不到35岁，大专以上学历。这组share/mean太高了

我试着把种族、年龄和教育三个因素放在一起，看看某个群体的总财富占总人口的比例。

# input data
scf_imp <- readRDS( file.path( path.expand( "~" ) , "SCF" , "scf 2016.rds" ) )

scf_rw <- readRDS( file.path( path.expand( "~" ) , "SCF" , "scf 2016 rw.rds" ) )

scf_design <-
  svrepdesign(
    weights = ~wgt ,
    repweights = scf_rw[ , -1 ] ,
    data = imputationList( scf_imp ) ,
    scale = 1 ,
    rscales = rep( 1 / 998 , 999 ) ,
    mse = FALSE ,
    type = "other" ,
    combined.weights = TRUE
  )

# Variable Recoding
scf_design <- update(scf_design ,

                     racecl4 = factor(racecl4 ,
                                      labels = c("White" ,
                                                 "Black" ,
                                                 "Hispanic/Latino" ,
                                                 "Other" )),
                     edcl = factor(edcl ,
                                   labels = c("less than high school" ,
                                              "high school or GED" ,
                                              "some college" ,
                                              "college degree" )),
                     agecl = factor(agecl ,
                                    labels = c("less than 35" ,
                                               "35-44" ,
                                               "45-54" ,
                                               "55-64" ,
                                               "65-74" ,
                                               "75 or more"))
)
# calculation
trible <- scf_MIcombine( with( scf_design ,
                               svyby( ~ networth , ~ interaction(racecl4 , edcl , agecl) , svytotal )
) )

sum_black <- trible[[1]][str_detect(names(trible[[1]]),"Black")] %>% sum()
black <- trible[[1]][str_detect(names(trible[[1]]),"Black")] %>% matrix(nrow = 4)
black <- as.data.frame(black/sum_black)
colnames(black) <- c("less than 35" , "35-44" , "45-54" , "55-64" ,"65-74" , "75 or more")
black <- black %>% mutate(total = rowSums(black))
black <- rbind(black,total = colSums(black))
black <- sapply(black,scales::percent) %>% as.data.frame()
rownames(black) <- c("less than high school" , "high school or GED" , "some college" , "college degree", "total" )
black <- rownames_to_column(black,"share for black")

我用同样的方法计算平均值。结果发现，年龄小于35岁，大专以上文化程度的黑人群体，share/mean值非常高。但它不应该。是数据有问题还是我使用的方法有问题？

_{（来源：sinaimg.cn）}

Answer 1

消费者财务调查大约有 6,000 个未加权的记录，您将结果分成近 100 个组，因此平均每个单元格只有 N=60 个。看看这个，看看它有多小。

counts <- scf_MIcombine( with( scf_design ,
                               svyby( ~ networth , ~ interaction(racecl4 , edcl , agecl) , unwtd.count )
) )

这不是硬性规定，但如果标准误差超过统计数据的 30%，则该统计数据可能不稳定。看看 SE( trible ) / coef( trible ) > 0.3，您会发现几乎所有统计数据都不稳定。

SCF 是一个了不起的数据集，但样本量可能不够大，无法支持如此精确的突破。谢谢

来自 lodown 包的 SCF 数据问题

SCF data issue from lodown package

r

survey