使用 predict.coxph、simPH 和公式预测相对风险
Predicting relative risk with predict.coxph, simPH and the formula
关于 predict.coxph()
输出的解释有一个很好的 post。但是,比较 predict.coxph
、simPH
的输出和相对风险公式,我不断得到不同的结果。由于我的假设包含二次效应,因此我将在我的示例中包含一个幂为 2 的多项式。
我使用 this post.
中的示例
data("lung")
使用 predict() 预测相对风险
# Defining the quadratic predictor
lung$meal.cal_q <- lung$meal.cal^2
# conduct a cox regression with the predictor meal.cal, its quadratic version and some covariates.
cox_mod <- coxph(Surv(time, status) ~
ph.karno + pat.karno + meal.cal + meal.cal_q,
data = lung)
# a vector of fitted values to predict for
meal.cal_new <- seq(min(lung$meal.cal, na.rm= TRUE), max(lung$meal.cal,
na.rm= TRUE), by= 1)
# a vector of fitted values to predict for, the quadratic effect
meal.cal_q_new <- meal.cal_new^2
# the length of the vector with the values to predict for
n <- length(meal.cal_new)
# a dataframe with all the values to predict for
lung_new <- data.frame(ph.karno= rep(mean(lung$ph.karno, na.rm= TRUE), n),
pat.karno= rep(mean(lung$pat.karno, na.rm= TRUE), n),
meal.cal= meal.cal_new,
meal.cal_q = meal.cal_q_new)
# predict the relative risk
lung_new$rel_risk <- predict(cox_mod, lung_new, type= "risk")
用公式预测相对风险(见上文post)
# Defining the quadratic predictor
lung$meal.cal_q <- lung$meal.cal^2
# run a cox regression with the predictor meal.cal, its quadratic version and some covariates.
cox_mod <- coxph(Surv(time, status) ~
ph.karno + pat.karno + meal.cal + meal.cal_q,
data = lung)
# a vector of fitted values to predict for
meal.cal_new <- seq(min(lung$meal.cal, na.rm= TRUE), max(lung$meal.cal,
na.rm= TRUE), by= 1)
# a vector of fitted values to predict for, the quadratic effect
meal.cal_q_new <- meal.cal_new^2
# length of the vector to predict for
n <- length(meal.cal_new)
# A dataframe with the values to make the prediction for
lung_new2 <- data.frame(
ph.karno= rep(mean(lung$ph.karno, na.rm= TRUE), n),
pat.karno= rep(mean(lung$pat.karno, na.rm= TRUE), n),
meal.cal= meal.cal_new,
meal.cal_q = meal.cal_q_new)
# A dataframe with the values to compare the prediction with
lung_new_mean <- data.frame(
ph.karno= rep(mean(lung$ph.karno, na.rm= TRUE), n),
pat.karno= rep(mean(lung$pat.karno, na.rm= TRUE), n),
meal.cal= rep(mean(lung$meal.cal, na.rm= TRUE), n),
meal.cal_q = rep(mean(lung$meal.cal_q, na.rm= TRUE), n))
# extract the coefficients
coefCPH <- coef(cox_mod)
# make the prediction for the values of interest
cox_risk <-
exp(coefCPH["ph.karno"] * lung_new2[ , "ph.karno"] +
coefCPH["pat.karno"] * lung_new2[ , "pat.karno"] +
coefCPH["meal.cal"] * lung_new2[ , "meal.cal"] +
coefCPH["meal.cal_q"] * lung_new2[ , "meal.cal_q"])
# make the predictions for the values to compare with
cox_risk_mean <-
exp(coefCPH["ph.karno"] * lung_new_mean[ , "ph.karno"] +
coefCPH["pat.karno"] * lung_new_mean[ , "pat.karno"] +
coefCPH["meal.cal"] * lung_new_mean[ , "meal.cal"] +
coefCPH["meal.cal_q"] * lung_new_mean[ , "meal.cal_q"])
# calculate the relative risk
lung_new2$rel_risk <- unlist(cox_risk)/ unlist(cox_risk_mean)
现在使用 predict()
并使用公式预测相对风险的图:
ggplot(lung_new, aes(meal.cal, rel_risk)) +
geom_smooth() +
geom_smooth(data= lung_new2, col= "red")
该图显示预测不同。我不明白为什么会这样,尽管 mentioned post 表明预测函数和公式应该给出相同的结果。
由于这种混乱,我试图用 simPH
包解决问题。这是我所做的:
# Defining the quadratic predictor
lung$meal.cal_q <- lung$meal.cal^2
# run a cox regression with the predictor, its quadratic version and some covariates.
cox_mod <- coxph(Surv(time, status) ~
ph.karno + pat.karno + meal.cal + meal.cal_q,
data = lung)
# a vector of fitted values to predict for
meal.cal_new <- seq(min(lung$meal.cal, na.rm= TRUE),
max(lung$meal.cal, na.rm= TRUE), by= 1)
# length of the vector to predict for
n <- length(meal.cal_new)
# A vector with the values to compare the prediction with
meal.cal_new_mean <- rep(mean(lung$meal.cal, na.rm= TRUE), n)
# running 100 simulations per predictor value with coxsimPoly
Sim <- coxsimPoly(obj= cox_mod, b = "meal.cal", pow = 2,
qi = "Relative Hazard",
Xj = meal.cal_new,
Xl = meal.cal_new_mean,
ci = .95,
nsim = 100,
extremesDrop = FALSE)
# plot the result
simGG(Sim)
这给出了一个带有警告的空图
Warning messages:
1: In min(obj$sims[, x]) : no non-missing arguments to min; returning Inf
2: In max(obj$sims[, x]) : no non-missing arguments to max; returning -Inf
而且 Sim$sims
对象看起来确实是空的。
我的问题是:
为什么 predict()
的结果和公式的使用不同?
为什么simPH
包没有计算相对风险?
我应该选择哪种方法?我的假设是 cox 回归中的二次效应,我需要这个预测变量及其相对风险的图(与处于平均值的预测变量相比),就像示例中一样。
simPH 问题的快速解答:需要在 coxph
调用中使用 I
函数指定多项式项,例如:
cox_mod <- coxph(Surv(time, status) ~
ph.karno + pat.karno + meal.cal + I(meal.cal^2),
data = lung)
(您的用例中的错误处理非常糟糕。)
在上面的代码中使用此修改(和 1000 次模拟)应该 return 类似于:
simPH 和 predict
之间的区别
我对差异的猜测是 simPH 不会像 predict
那样围绕转换后的点估计创建置信区间。它从拟合模型指定的多元正态分布中提取模拟,然后显示该模拟分布的中心 50% 和 95%。中心线只是模拟人生的中位数。这显然是与 predict
不同的逻辑。对于非常非单调的兴趣量,例如这个,与 simPH 相比,predict
点估计给出了具有高度误导性的结果。基于 4 个观察结果,几乎没有证据表明存在这种形式。
关于 predict.coxph()
输出的解释有一个很好的 post。但是,比较 predict.coxph
、simPH
的输出和相对风险公式,我不断得到不同的结果。由于我的假设包含二次效应,因此我将在我的示例中包含一个幂为 2 的多项式。
我使用 this post.
中的示例data("lung")
使用 predict() 预测相对风险
# Defining the quadratic predictor
lung$meal.cal_q <- lung$meal.cal^2
# conduct a cox regression with the predictor meal.cal, its quadratic version and some covariates.
cox_mod <- coxph(Surv(time, status) ~
ph.karno + pat.karno + meal.cal + meal.cal_q,
data = lung)
# a vector of fitted values to predict for
meal.cal_new <- seq(min(lung$meal.cal, na.rm= TRUE), max(lung$meal.cal,
na.rm= TRUE), by= 1)
# a vector of fitted values to predict for, the quadratic effect
meal.cal_q_new <- meal.cal_new^2
# the length of the vector with the values to predict for
n <- length(meal.cal_new)
# a dataframe with all the values to predict for
lung_new <- data.frame(ph.karno= rep(mean(lung$ph.karno, na.rm= TRUE), n),
pat.karno= rep(mean(lung$pat.karno, na.rm= TRUE), n),
meal.cal= meal.cal_new,
meal.cal_q = meal.cal_q_new)
# predict the relative risk
lung_new$rel_risk <- predict(cox_mod, lung_new, type= "risk")
用公式预测相对风险(见上文post)
# Defining the quadratic predictor
lung$meal.cal_q <- lung$meal.cal^2
# run a cox regression with the predictor meal.cal, its quadratic version and some covariates.
cox_mod <- coxph(Surv(time, status) ~
ph.karno + pat.karno + meal.cal + meal.cal_q,
data = lung)
# a vector of fitted values to predict for
meal.cal_new <- seq(min(lung$meal.cal, na.rm= TRUE), max(lung$meal.cal,
na.rm= TRUE), by= 1)
# a vector of fitted values to predict for, the quadratic effect
meal.cal_q_new <- meal.cal_new^2
# length of the vector to predict for
n <- length(meal.cal_new)
# A dataframe with the values to make the prediction for
lung_new2 <- data.frame(
ph.karno= rep(mean(lung$ph.karno, na.rm= TRUE), n),
pat.karno= rep(mean(lung$pat.karno, na.rm= TRUE), n),
meal.cal= meal.cal_new,
meal.cal_q = meal.cal_q_new)
# A dataframe with the values to compare the prediction with
lung_new_mean <- data.frame(
ph.karno= rep(mean(lung$ph.karno, na.rm= TRUE), n),
pat.karno= rep(mean(lung$pat.karno, na.rm= TRUE), n),
meal.cal= rep(mean(lung$meal.cal, na.rm= TRUE), n),
meal.cal_q = rep(mean(lung$meal.cal_q, na.rm= TRUE), n))
# extract the coefficients
coefCPH <- coef(cox_mod)
# make the prediction for the values of interest
cox_risk <-
exp(coefCPH["ph.karno"] * lung_new2[ , "ph.karno"] +
coefCPH["pat.karno"] * lung_new2[ , "pat.karno"] +
coefCPH["meal.cal"] * lung_new2[ , "meal.cal"] +
coefCPH["meal.cal_q"] * lung_new2[ , "meal.cal_q"])
# make the predictions for the values to compare with
cox_risk_mean <-
exp(coefCPH["ph.karno"] * lung_new_mean[ , "ph.karno"] +
coefCPH["pat.karno"] * lung_new_mean[ , "pat.karno"] +
coefCPH["meal.cal"] * lung_new_mean[ , "meal.cal"] +
coefCPH["meal.cal_q"] * lung_new_mean[ , "meal.cal_q"])
# calculate the relative risk
lung_new2$rel_risk <- unlist(cox_risk)/ unlist(cox_risk_mean)
现在使用 predict()
并使用公式预测相对风险的图:
ggplot(lung_new, aes(meal.cal, rel_risk)) +
geom_smooth() +
geom_smooth(data= lung_new2, col= "red")
该图显示预测不同。我不明白为什么会这样,尽管 mentioned post 表明预测函数和公式应该给出相同的结果。
由于这种混乱,我试图用 simPH
包解决问题。这是我所做的:
# Defining the quadratic predictor
lung$meal.cal_q <- lung$meal.cal^2
# run a cox regression with the predictor, its quadratic version and some covariates.
cox_mod <- coxph(Surv(time, status) ~
ph.karno + pat.karno + meal.cal + meal.cal_q,
data = lung)
# a vector of fitted values to predict for
meal.cal_new <- seq(min(lung$meal.cal, na.rm= TRUE),
max(lung$meal.cal, na.rm= TRUE), by= 1)
# length of the vector to predict for
n <- length(meal.cal_new)
# A vector with the values to compare the prediction with
meal.cal_new_mean <- rep(mean(lung$meal.cal, na.rm= TRUE), n)
# running 100 simulations per predictor value with coxsimPoly
Sim <- coxsimPoly(obj= cox_mod, b = "meal.cal", pow = 2,
qi = "Relative Hazard",
Xj = meal.cal_new,
Xl = meal.cal_new_mean,
ci = .95,
nsim = 100,
extremesDrop = FALSE)
# plot the result
simGG(Sim)
这给出了一个带有警告的空图
Warning messages:
1: In min(obj$sims[, x]) : no non-missing arguments to min; returning Inf
2: In max(obj$sims[, x]) : no non-missing arguments to max; returning -Inf
而且 Sim$sims
对象看起来确实是空的。
我的问题是:
为什么
predict()
的结果和公式的使用不同?为什么
simPH
包没有计算相对风险?我应该选择哪种方法?我的假设是 cox 回归中的二次效应,我需要这个预测变量及其相对风险的图(与处于平均值的预测变量相比),就像示例中一样。
simPH 问题的快速解答:需要在 coxph
调用中使用 I
函数指定多项式项,例如:
cox_mod <- coxph(Surv(time, status) ~
ph.karno + pat.karno + meal.cal + I(meal.cal^2),
data = lung)
(您的用例中的错误处理非常糟糕。)
在上面的代码中使用此修改(和 1000 次模拟)应该 return 类似于:
simPH 和 predict
之间的区别
我对差异的猜测是 simPH 不会像 predict
那样围绕转换后的点估计创建置信区间。它从拟合模型指定的多元正态分布中提取模拟,然后显示该模拟分布的中心 50% 和 95%。中心线只是模拟人生的中位数。这显然是与 predict
不同的逻辑。对于非常非单调的兴趣量,例如这个,与 simPH 相比,predict
点估计给出了具有高度误导性的结果。基于 4 个观察结果,几乎没有证据表明存在这种形式。