使用 dplyr 或 reshape2 跨数据框多列的卡方统计

Question

我有一个关于使用 dplyr 和 reshape2 计算多列卡方统计量的问题。下面是一个小数据框...

Sat <- c("Satisfied","Satisfied","Dissatisfied","Dissatisfied",
                                       "Neutral")

Gender <- c("Male","Male","Female","Male","Female")

Ethnicity <- c("Asian","White","White","Asian","White")

AgeGroup <- c("18-20","18-20","21-23","18-20","18-28")

Example <- data.frame(Sat,Gender,Ethnicity,AgeGroup)

我将如何使用 summarise_each 或 melt 来针对每个其他变量计算 Sat 列以生成卡方残差和 p 值统计数据。我在想一定有这样的东西：

Example %>% summarise_each(funs(chisq.test(...

但我不确定如何完成它。另外，我将如何融化数据框并使用 group_by 或 do() 来获取卡方统计数据？我有兴趣看到这两种方法。如果有办法合并 broom 包，那也很好，或者 tidyr 而不是 reshape2。

所以回顾一下，我想运行卡方检验，比如

chisq.test(Example$Sat, Example$Gender)

但是...我想针对 Gender、Ethnicity 和 AgeGroup 生成 Sat 变量的卡方统计数据。这是一个小例子，我希望上面的方法能让我以快速有效的方式跨多个列创建卡方统计数据。如果我可以用 ggplot2 在热图中绘制残差，那将是一个额外的好处，这就是为什么我有兴趣将 broom 包合并到这个例子中。

Answer 1

如果我们需要获取 p values

 Example %>% 
    summarise_each(funs(chisq.test(., 
               Example$Sat)$p.value), -one_of("Sat"))
 #     Gender Ethnicity  AgeGroup
 #1 0.2326237 0.6592406 0.1545873

或提取statistic

Example %>%
    summarise_each(funs(chisq.test(., 
           Example$Sat)$statistic), -one_of("Sat"))
#   Gender Ethnicity AgeGroup
#1 2.916667 0.8333333 6.666667

要获得 residuals，使用 base R

会更容易

 lapply(Example[setdiff(names(Example), "Sat")], 
       function(x) chisq.test(x, Example$Sat)$residuals)

使用 dplyr 或 reshape2 跨数据框多列的卡方统计

Chi-square statistic across multiple columns of a dataframe using dplyr or reshape2

r

reshape2

dplyr

broom