如何遍历 r 中的分类变量

Question

我在 titanic 数据集上拟合了一个逻辑回归模型，它由几个分类变量组成。

surv.glm= glm(survived ~ class + age + sex, data=titanic, family=binomial)


Coefficients:
   (Intercept)  class2nd class  class3rd class       ageadults          sexman  
         3.062          -1.011          -1.766          -1.056          -2.369

部分数据：

class       age     sex survived
1st class   adults  man yes
1st class   adults  man yes

数据中有三个 classes（第 1、2 和 3）。在 class 的字段中有一个 crew 级别，但它似乎没有出现在数据中。因此，第二个和第三个 classes 的两个零必须表示第一个 class。

问题是：

Find the probability of survival for all possible cases in the titanic incident. Sort them by the probability of survival. Automate the process as much as you can.

根据模型的系数，我写了这段代码：

predict_surv = function(class_2nd, class_3th, age_adult,sex_man) {
  surv=3.062-1.011*class_2nd-1.766*class_3th-1.056*age_adult-2.369*sex_man 
  odd = exp(surv)
  p = odd / (1 + odd)
  return(p)
}

i <- 1
for (class2nd in c(0,1))
  for (class3th in c(0,1))
    for (adult in c(0,1))
      for (sex in c(0,1)) {
        pr[[i]] = predict_surv(class2nd,class3th,adult,sex)
        i <- i+1
      }
pr = sort(pr,decreasing = T)
print(pr)

它有效，但我想在输出中为每个分类变量组合打印实际标签。我如何使用 R 在高效且标准的解决方案中做到这一点？

Answer 1

我会这样进行：

部分数据：

library(titanic)
titanic_comp <- titanic_train[complete.cases(titanic_train),]

创建模型：

model <- glm(Survived ~ Pclass + Age + Sex,
             data = titanic_comp,
             family = binomial)

创建所有可能的组合

new.data <- expand.grid(Pclass = unique(titanic_train$Pclass),
                        Age = unique(titanic_train$Age),
                        Sex = unique(titanic_train$Sex))

预测所有可能的组合

new.data$prob <- predict(model, new.data, "response")

排序

new.data[order(new.data$prob, decreasing = TRUE),]

head(new.data[order(new.data$prob, decreasing = TRUE),])
    Pclass  Age    Sex      prob
521      1 0.42 female 0.9770664
515      1 0.67 female 0.9768586
494      1 0.75 female 0.9767917
377      1 0.83 female 0.9767247
473      1 0.92 female 0.9766490
437      1 1.00 female 0.9765815

所以，如果你是一个富有的女婴，你可以轻松地从泰坦尼克号中幸存下来。

如何遍历 r 中的分类变量

How to iterate through categorical variables in r

r

categorical-data