如何遍历 r 中的分类变量
How to iterate through categorical variables in r
我在 titanic
数据集上拟合了一个逻辑回归模型,它由几个分类变量组成。
surv.glm= glm(survived ~ class + age + sex, data=titanic, family=binomial)
Coefficients:
(Intercept) class2nd class class3rd class ageadults sexman
3.062 -1.011 -1.766 -1.056 -2.369
部分数据:
class age sex survived
1st class adults man yes
1st class adults man yes
数据中有三个 classes(第 1、2 和 3)。在 class
的字段中有一个 crew
级别,但它似乎没有出现在数据中。因此,第二个和第三个 classes 的两个零必须表示第一个 class。
问题是:
Find the probability of survival for all possible cases in the titanic incident. Sort them by the probability of survival. Automate the process as much as you can.
根据模型的系数,我写了这段代码:
predict_surv = function(class_2nd, class_3th, age_adult,sex_man) {
surv=3.062-1.011*class_2nd-1.766*class_3th-1.056*age_adult-2.369*sex_man
odd = exp(surv)
p = odd / (1 + odd)
return(p)
}
i <- 1
for (class2nd in c(0,1))
for (class3th in c(0,1))
for (adult in c(0,1))
for (sex in c(0,1)) {
pr[[i]] = predict_surv(class2nd,class3th,adult,sex)
i <- i+1
}
pr = sort(pr,decreasing = T)
print(pr)
它有效,但我想在输出中为每个分类变量组合打印实际标签。我如何使用 R 在高效且标准的解决方案中做到这一点?
我会这样进行:
部分数据:
library(titanic)
titanic_comp <- titanic_train[complete.cases(titanic_train),]
创建模型:
model <- glm(Survived ~ Pclass + Age + Sex,
data = titanic_comp,
family = binomial)
创建所有可能的组合
new.data <- expand.grid(Pclass = unique(titanic_train$Pclass),
Age = unique(titanic_train$Age),
Sex = unique(titanic_train$Sex))
预测所有可能的组合
new.data$prob <- predict(model, new.data, "response")
排序
new.data[order(new.data$prob, decreasing = TRUE),]
head(new.data[order(new.data$prob, decreasing = TRUE),])
Pclass Age Sex prob
521 1 0.42 female 0.9770664
515 1 0.67 female 0.9768586
494 1 0.75 female 0.9767917
377 1 0.83 female 0.9767247
473 1 0.92 female 0.9766490
437 1 1.00 female 0.9765815
所以,如果你是一个富有的女婴,你可以轻松地从泰坦尼克号中幸存下来。
我在 titanic
数据集上拟合了一个逻辑回归模型,它由几个分类变量组成。
surv.glm= glm(survived ~ class + age + sex, data=titanic, family=binomial)
Coefficients:
(Intercept) class2nd class class3rd class ageadults sexman
3.062 -1.011 -1.766 -1.056 -2.369
部分数据:
class age sex survived
1st class adults man yes
1st class adults man yes
数据中有三个 classes(第 1、2 和 3)。在 class
的字段中有一个 crew
级别,但它似乎没有出现在数据中。因此,第二个和第三个 classes 的两个零必须表示第一个 class。
问题是:
Find the probability of survival for all possible cases in the titanic incident. Sort them by the probability of survival. Automate the process as much as you can.
根据模型的系数,我写了这段代码:
predict_surv = function(class_2nd, class_3th, age_adult,sex_man) {
surv=3.062-1.011*class_2nd-1.766*class_3th-1.056*age_adult-2.369*sex_man
odd = exp(surv)
p = odd / (1 + odd)
return(p)
}
i <- 1
for (class2nd in c(0,1))
for (class3th in c(0,1))
for (adult in c(0,1))
for (sex in c(0,1)) {
pr[[i]] = predict_surv(class2nd,class3th,adult,sex)
i <- i+1
}
pr = sort(pr,decreasing = T)
print(pr)
它有效,但我想在输出中为每个分类变量组合打印实际标签。我如何使用 R 在高效且标准的解决方案中做到这一点?
我会这样进行:
部分数据:
library(titanic)
titanic_comp <- titanic_train[complete.cases(titanic_train),]
创建模型:
model <- glm(Survived ~ Pclass + Age + Sex,
data = titanic_comp,
family = binomial)
创建所有可能的组合
new.data <- expand.grid(Pclass = unique(titanic_train$Pclass),
Age = unique(titanic_train$Age),
Sex = unique(titanic_train$Sex))
预测所有可能的组合
new.data$prob <- predict(model, new.data, "response")
排序
new.data[order(new.data$prob, decreasing = TRUE),]
head(new.data[order(new.data$prob, decreasing = TRUE),])
Pclass Age Sex prob
521 1 0.42 female 0.9770664
515 1 0.67 female 0.9768586
494 1 0.75 female 0.9767917
377 1 0.83 female 0.9767247
473 1 0.92 female 0.9766490
437 1 1.00 female 0.9765815
所以,如果你是一个富有的女婴,你可以轻松地从泰坦尼克号中幸存下来。