如何用 dplyr 总结这些数据 table，然后在结果上运行一个 chisq.test（或类似的）并将其全部循环到一个简洁的函数中？

Question

这个问题包含在我问的另一个问题 here 中，但由于它超出了我在最初询问中想知道的范围，我认为它可能需要一个单独的线程。

我一直在尝试根据收到的答案来解决这个问题 here and here using dplyr and the functions written by Khashaa and Jaap。

使用提供给我的解决方案（尤其是来自 Jaap 的解决方案），我已经能够将收到的原始数据汇总成矩阵形式的数据 table

dput(SO_Example_v1)
structure(list(Type = structure(c(3L, 1L, 2L), .Label = c("Community", 
"Contaminant", "Healthcare"), class = "factor"), hosp1_WoundAssocType = c(464L, 
285L, 24L), hosp1_BloodAssocType = c(73L, 40L, 26L), hosp1_UrineAssocType = c(75L, 
37L, 18L), hosp1_RespAssocType = c(137L, 77L, 2L), hosp1_CathAssocType = c(80L, 
34L, 24L), hosp2_WoundAssocType = c(171L, 115L, 17L), hosp2_BloodAssocType = c(127L, 
62L, 12L), hosp2_UrineAssocType = c(50L, 29L, 6L), hosp2_RespAssocType = c(135L, 
142L, 6L), hosp2_CathAssocType = c(95L, 24L, 12L)), .Names = c("Type", 
"hosp1_WoundAssocType", "hosp1_BloodAssocType", "hosp1_UrineAssocType", 
"hosp1_RespAssocType", "hosp1_CathAssocType", "hosp2_WoundAssocType", 
"hosp2_BloodAssocType", "hosp2_UrineAssocType", "hosp2_RespAssocType", 
"hosp2_CathAssocType"), class = "data.frame", row.names = c(NA, 
-3L))

看起来如下

require(dplyr)
df <- tbl_df(SO_Example_v1)
head(df)
         Type hosp1_WoundAssocType hosp1_BloodAssocType hosp1_UrineAssocType
1  Healthcare                  464                   73                   75
2   Community                  285                   40                   37
3 Contaminant                   24                   26                   18
Variables not shown: hosp1_RespAssocType (int), hosp1_CathAssocType (int), hosp2_WoundAssocType
  (int), hosp2_BloodAssocType (int), hosp2_UrineAssocType (int), hosp2_RespAssocType (int),
  hosp2_CathAssocType (int)

第Type栏是细菌的种类，后面的栏代表它们的培养地点。数字代表检测到相应类型细菌的次数。

我知道我的最终 table 应该是什么样子，但直到现在我一直在为每个比较和变量一步一步地做，毫无疑问，必须有一种方法可以通过在管道中传递多个函数来做到这一点dplyr - 但是，唉，我还没有找到关于这个的答案。

最终 table 的示例

                                                 Wound
Type                            n Hospital 1 (%)      n Hospital 2 (%)  p-val
Healthcare associated bacteria     464 (60.0)            171 (56.4)     0.28
Community associated bacteria      285 (36.9)            115 (38.0)     0.74
Contaminants                       24 (3.1)              17 (5.6)       0.05

其中第一个分组变量 "Wound" 随后被 "Urine"、"Respiratory" 替换，...然后有一个名为 "All/Total" 的最后一列"Type" 行中的每个变量在医院 1 和医院 2 中被发现和汇总然后进行比较的总次数。

到目前为止我所做的是以下并且非常乏味，因为它是计算 "by hand" 然后我手动将所有结果填充到 table。

### Wound cultures & healthcare associated (extracted manually)
# hosp1 464 (yes), 309 (no), 773 wound isolates in total; (% = 464 / 309 * 100)
# hosp2 171 (yes), 132 (no), 303 would isolates in total; (% = 171 / 303 * 100)

### Then the chisq.test of my contingency table
chisq.test(cbind(c(464,309),c(171,132)),correct=FALSE)

我很感激，如果我运行在原始 data.frame 上使用管道 dplyr，我将无法获得我想要的 table 的确切格式，但必须有一种方法至少可以自动执行此处的所有步骤，并将结果放在最终的 table 中，我可以将其导出为 .csv 文件，然后只进行一些最后的列编辑等？

非常感谢任何帮助。

Answer 1

它很难看，但它有效（评论中的 Sam 是对的，整个问题应该通过在分析之前将数据调整为干净格式来解决，但无论如何）：

Map(
  function(x,y) {
    out <- cbind(x,y)
    final <- rbind(out[1,],colSums(out[2:3,]))
    chisq.test(final,correct=FALSE)
  },
  SO_Example_v1[grepl("^hosp1",names(SO_Example_v1))],
  SO_Example_v1[grepl("^hosp2",names(SO_Example_v1))] 
)

#$hosp1_WoundAssocType
#
#        Pearson's Chi-squared test
#
#data:  final
#X-squared = 1.16, df = 1, p-value = 0.2815
# etc etc...

符合您的预期结果：

chisq.test(cbind(c(464,309),c(171,132)),correct=FALSE)
#
#        Pearson's Chi-squared test
# 
#data:  cbind(c(464, 309), c(171, 132))
#X-squared = 1.16, df = 1, p-value = 0.2815

如何用 dplyr 总结这些数据 table，然后在结果上运行一个 chisq.test（或类似的）并将其全部循环到一个简洁的函数中？

How do summarize this data table with dplyr, then run a chisq.test (or similar) on the results and loop it all into one neat function?

r

function

chi-squared

dplyr

如何用 dplyr 总结这些数据 table，然后在结果上 运行 一个 chisq.test（或类似的）并将其全部循环到一个简洁的函数中？

How do summarize this data table with dplyr, then run a chisq.test (or similar) on the results and loop it all into one neat function?

r

function

chi-squared

dplyr

如何用 dplyr 总结这些数据 table，然后在结果上运行一个 chisq.test（或类似的）并将其全部循环到一个简洁的函数中？