在 R 中创建多个 Chisq/t-tests 的更有效方法是什么? (使用泰坦尼克号数据)
What would be a more efficient way to create multiple Chisq/t-tests in R? (using Titanic data)
我有一些非常基本的代码,用于为 titanic 数据集中的某些变量生成 chisq 测试。我想有一种方法来区分分类变量与 numeric/cont 变量,因此它只会对分类变量或 t.tests 进行 chisq 测试,如果有数字变量的话。
我对能够像这样比较幸存和未幸存组之间的多个级别感兴趣:
道具幸存女性与道具未幸存女性,
Prop Survived Class 1 vs Prop Not-Survived Class 1,
等等..
table 子集针对 Survived/Not-Survived 女性比较
library(Titanic)
titanic <- as.data.frame(Titanic)
names <- names(titanic)
names(cars)
for (var in names) {
tabla<-table(titanic$Survived, titanic[[var]])
tabla<-addmargins(tabla)
print(tab)
res<-prop.test(x = c(tabla[1,2], tabla[2,2]), n = c(tabla[1,3], tabla[2,3]), correct = F)
print(var)
print(res)
}
}
Thank you
我建议您使用检测变量 class 的函数。我画了一个函数的草图,如果需要你可以修改。它需要两个参数,数据帧和变量名称。
library(titanic)
#Data
data("Titanic")
titanic <- as.data.frame(Titanic)
#Function
mytest <- function(data,x)
{
#Detect the type of var
if(is.numeric(data[[x]]))
{
#Build variables x and y
a <- data[[x]][data$Survived=='No']
b <- data[[x]][data$Survived=='Yes']
#Apply the test
Res <- t.test(a,b)
print(Res)
} else
{
#Create table
tab <- table(data$Survived,data[[x]])
#Split in a list of vectors
L1 <- lapply(1:ncol(tab), function(i) {tab[,i] })
names(L1) <- dimnames(tab)[[2]]
#Margins
Margins <- rowSums(tab)
#Test
L2 <- lapply(L1, function(z) {prop.test(x = z, n = Margins, correct = F)})
print(L2)
}
}
一些例子:
#Apply the function
mytest(data = titanic, x = 'Sex')
mytest(data = titanic, x = 'Freq')
输出:
mytest(data = titanic, x = 'Sex')
$Male
2-sample test for equality of proportions without continuity correction
data: z out of Margins
X-squared = 0, df = 1, p-value = 1
alternative hypothesis: two.sided
95 percent confidence interval:
-0.346476 0.346476
sample estimates:
prop 1 prop 2
0.5 0.5
$Female
2-sample test for equality of proportions without continuity correction
data: z out of Margins
X-squared = 0, df = 1, p-value = 1
alternative hypothesis: two.sided
95 percent confidence interval:
-0.346476 0.346476
sample estimates:
prop 1 prop 2
0.5 0.5
第二个输出:
mytest(data = titanic, x = 'Freq')
Welch Two Sample t-test
data: a and b
t = 1.013, df = 17.768, p-value = 0.3246
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-52.38066 149.75566
sample estimates:
mean of x mean of y
93.1250 44.4375
我有一些非常基本的代码,用于为 titanic 数据集中的某些变量生成 chisq 测试。我想有一种方法来区分分类变量与 numeric/cont 变量,因此它只会对分类变量或 t.tests 进行 chisq 测试,如果有数字变量的话。
我对能够像这样比较幸存和未幸存组之间的多个级别感兴趣:
道具幸存女性与道具未幸存女性, Prop Survived Class 1 vs Prop Not-Survived Class 1, 等等..
table 子集针对 Survived/Not-Survived 女性比较
library(Titanic)
titanic <- as.data.frame(Titanic)
names <- names(titanic)
names(cars)
for (var in names) {
tabla<-table(titanic$Survived, titanic[[var]])
tabla<-addmargins(tabla)
print(tab)
res<-prop.test(x = c(tabla[1,2], tabla[2,2]), n = c(tabla[1,3], tabla[2,3]), correct = F)
print(var)
print(res)
}
}
Thank you
我建议您使用检测变量 class 的函数。我画了一个函数的草图,如果需要你可以修改。它需要两个参数,数据帧和变量名称。
library(titanic)
#Data
data("Titanic")
titanic <- as.data.frame(Titanic)
#Function
mytest <- function(data,x)
{
#Detect the type of var
if(is.numeric(data[[x]]))
{
#Build variables x and y
a <- data[[x]][data$Survived=='No']
b <- data[[x]][data$Survived=='Yes']
#Apply the test
Res <- t.test(a,b)
print(Res)
} else
{
#Create table
tab <- table(data$Survived,data[[x]])
#Split in a list of vectors
L1 <- lapply(1:ncol(tab), function(i) {tab[,i] })
names(L1) <- dimnames(tab)[[2]]
#Margins
Margins <- rowSums(tab)
#Test
L2 <- lapply(L1, function(z) {prop.test(x = z, n = Margins, correct = F)})
print(L2)
}
}
一些例子:
#Apply the function
mytest(data = titanic, x = 'Sex')
mytest(data = titanic, x = 'Freq')
输出:
mytest(data = titanic, x = 'Sex')
$Male
2-sample test for equality of proportions without continuity correction
data: z out of Margins
X-squared = 0, df = 1, p-value = 1
alternative hypothesis: two.sided
95 percent confidence interval:
-0.346476 0.346476
sample estimates:
prop 1 prop 2
0.5 0.5
$Female
2-sample test for equality of proportions without continuity correction
data: z out of Margins
X-squared = 0, df = 1, p-value = 1
alternative hypothesis: two.sided
95 percent confidence interval:
-0.346476 0.346476
sample estimates:
prop 1 prop 2
0.5 0.5
第二个输出:
mytest(data = titanic, x = 'Freq')
Welch Two Sample t-test
data: a and b
t = 1.013, df = 17.768, p-value = 0.3246
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-52.38066 149.75566
sample estimates:
mean of x mean of y
93.1250 44.4375