在 tabyl 管道中使用 colnames 作为对象的循环
For loop utilizing colnames as object in tabyl pipeline
我需要 运行 对多个列联表进行卡方检验并将它们存储到数据框中。我曾想过使用 tabyl
和 chisq.test
函数。我的原始数据集包含患者症状报告。
编造的例子:
数据
df <- structure(list(Race = c("White", "Asian", "White", "Asian", "Black",
"Asian", "Black", "White"), Headache = c("No", "No", "Yes", "Yes", "No",
"No", "Yes", "Yes"), Paraesthesias = c("No", "Yes", "Yes", "Yes", "Yes", "No", "No", "No"
), Heartburn = c("Yes", "No", "No", "Yes", "No", "Yes", "Yes", "Yes")), row.names = c(NA,
-8L), class = "data.frame")
print(df)
暴力破解的预期结果
headache_p <- chisq.test(df[c(1,2)] %>% tabyl(Race, Headache))$p.value
paraesthesias_p <- chisq.test(df[c(1,3)] %>% tabyl(Race, Paraesthesias))$p.value
heartburn_p <- chisq.test(df[c(1,4)] %>% tabyl(Race, Heartburn))$p.value
data.frame("Headache" = headache_p, "Paraesthesias" = paraesthesias_p, "Heartburn" = heartburn_p, row.names = "p.value")
尝试使用循环获得期望的结果
y <- list()
for (i in 2:4) {
z <- chisq.test(df[c(1, i)] %>% tabyl(Race, colnames(df[i]), show_na = FALSE))
y <- c(y, z)
}
setNames(data.frame(y, row.names = "p.value"), colnames(df)[-1])
错误信息
Error: Can't extract columns that don't exist.
x Column `Headache` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.
问题
如何为这个过程创建一个 for 循环?我的原始数据集有 60 多个症状,因此需要一个循环。我不知道如何将列名放入管道中,因为它会将其视为字符而不是对象。
您可以在此处使用 map_dbl
来完成 for
循环的工作。
当您将列名作为字符传递时,您可以在 tabyl
中使用 .data
参数。
library(purrr)
library(dplyr)
library(janitor)
cols <- names(df[-1])
map_dbl(cols, ~chisq.test(df %>% tabyl(Race, .data[[.x]]))$p.value) %>%
t %>%
as.data.frame() %>%
setNames(cols)
# Headache Paraesthesias Heartburn
#1 0.7165313 0.7165313 0.9149472
我需要 运行 对多个列联表进行卡方检验并将它们存储到数据框中。我曾想过使用 tabyl
和 chisq.test
函数。我的原始数据集包含患者症状报告。
编造的例子:
数据
df <- structure(list(Race = c("White", "Asian", "White", "Asian", "Black",
"Asian", "Black", "White"), Headache = c("No", "No", "Yes", "Yes", "No",
"No", "Yes", "Yes"), Paraesthesias = c("No", "Yes", "Yes", "Yes", "Yes", "No", "No", "No"
), Heartburn = c("Yes", "No", "No", "Yes", "No", "Yes", "Yes", "Yes")), row.names = c(NA,
-8L), class = "data.frame")
print(df)
暴力破解的预期结果
headache_p <- chisq.test(df[c(1,2)] %>% tabyl(Race, Headache))$p.value
paraesthesias_p <- chisq.test(df[c(1,3)] %>% tabyl(Race, Paraesthesias))$p.value
heartburn_p <- chisq.test(df[c(1,4)] %>% tabyl(Race, Heartburn))$p.value
data.frame("Headache" = headache_p, "Paraesthesias" = paraesthesias_p, "Heartburn" = heartburn_p, row.names = "p.value")
尝试使用循环获得期望的结果
y <- list()
for (i in 2:4) {
z <- chisq.test(df[c(1, i)] %>% tabyl(Race, colnames(df[i]), show_na = FALSE))
y <- c(y, z)
}
setNames(data.frame(y, row.names = "p.value"), colnames(df)[-1])
错误信息
Error: Can't extract columns that don't exist.
x Column `Headache` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.
问题
如何为这个过程创建一个 for 循环?我的原始数据集有 60 多个症状,因此需要一个循环。我不知道如何将列名放入管道中,因为它会将其视为字符而不是对象。
您可以在此处使用 map_dbl
来完成 for
循环的工作。
当您将列名作为字符传递时,您可以在 tabyl
中使用 .data
参数。
library(purrr)
library(dplyr)
library(janitor)
cols <- names(df[-1])
map_dbl(cols, ~chisq.test(df %>% tabyl(Race, .data[[.x]]))$p.value) %>%
t %>%
as.data.frame() %>%
setNames(cols)
# Headache Paraesthesias Heartburn
#1 0.7165313 0.7165313 0.9149472