我如何预测在 R 中使用带有生存包的 AFT 模型？

Question

我正在使用 accelerated failure time / AFT model with a weibull distribution to predict data. I am doing this using the survival package in R. I am splitting my data in training and test, do training on the training set and afterwards try to predict the values for the test set. To do that I am passing the the test set as the newdata parameter, as stated in the references。我得到一个错误，说 newdata 与训练数据的大小不同（很明显！）。然后该函数似乎可以评估训练集的预测值。

如何预测新数据的值？

# get data
library(KMsurv)
library(survival)
data("kidtran") 
n = nrow(kidtran)
kidtran <- kidtran[sample(n),] # shuffle row-wise
kidtran.train = kidtran[1:(n * 0.8),]
kidtran.test = kidtran[(n * 0.8):n,]

# create model 
aftmodel <- survreg(kidtransurv~kidtran.train$gender+kidtran.train$race+kidtran.train$age, dist = "weibull")
predicted <- predict(aftmodel, newdata = kidtran.test)

编辑： 如 Hack-R 所述，缺少这行代码

kidtransurv <- Surv(kidtran.train$time, kidtran.train$delta)

Answer 1

问题似乎出在您对因变量的说明中。

你的问题中缺少依赖项的数据和代码定义，所以我看不出具体错误是什么，但它似乎不是一个合适的 Surv() 生存对象（参见 ?survreg).

您的代码的这个变体修复了这个问题，对格式进行了一些小的改进，并且运行良好：

require(survival)
pacman::p_load(KMsurv)

library(KMsurv)
library(survival)
data("kidtran") 

n = nrow(kidtran)

kidtran       <- kidtran[sample(n),] 
kidtran.train <- kidtran[1:(n * 0.8),]
kidtran.test  <- kidtran[(n * 0.8):n,]

# Whatever kidtransurv was supposed to be is missing from your question,
#   so I will replace it with something not-missing
#   and I will make it into a proper survival object with Surv()

aftmodel  <- survreg(Surv(time, delta) ~ gender + race + age, dist = "weibull", data = kidtran.train)
predicted <- predict(aftmodel, newdata = kidtran.test)


head(predicted)

       302        636        727        121         85        612 
 33190.413  79238.898 111401.546  16792.180   4601.363  17698.895

我如何预测在 R 中使用带有生存包的 AFT 模型？

How can I predict using an AFT model with the survival package in R?

r

prediction

survival-analysis

weibull