为什么这段代码会产生错误的 P 值?
Why is this code yielding erroneous P-values?
我正在尝试计算 P 与从具有时变系数的 Cox PH 模型获得的点估计相关联的值。我编写的函数没有提供正确的 P 值。我将通过使用生存包中的 NCCTG 肺癌数据来说明这一点。
# Setup
require(survival)
# Effect of Karnofsky score, linear
fit <- coxph(Surv(time/365.25, status == 2) ~ ph.karno + tt(ph.karno),
lung, tt=function(x, t, ...) {x*t})
函数:
# Same function but now with a P-value in the output
calculate.timeDependentHazard.P <- function(model,time) {
index.1 <- which(names(model$coef)=="ph.karno")
index.2 <- which(names(model$coef)=="tt(ph.karno)")
coef <- model$coef[c(index.1,index.2)]
var <- rbind(c(model$var[index.1,index.1],model$var[index.1,index.2]),
c(model$var[index.2,index.1],model$var[index.2,index.2]))
var.at.time <- t(c(1,time)) %*% var %*% c(1,time)
hazard.at.time <- t(c(1,time)) %*% coef
lower.95 <- hazard.at.time - 1.96*sqrt(var.at.time)
upper.95 <- hazard.at.time + 1.96*sqrt(var.at.time)
z.at.time <- hazard.at.time/(sqrt(var.at.time))
p.value <- pnorm(-abs(z.at.time))
results <- c(exp(c(hazard.at.time,lower.95,upper.95)),p.value)
names(results) <- c("hazard ratio","95% lower","95% upper","P.value")
options(scipen = 999)
results
}
# Point estimates after 1.05*365.25 = 383.5 days of follow-up
calculate.timeDependentHazard.P(fit,1.05)
输出:
> calculate.timeDependentHazard.P(fit,1.05)
hazard ratio 95% lower 95% upper P.value
0.98913256 0.97654719 1.00188013 0.04721342
显然,P-值应该 >.05 但不知何故不是。通过这种方法计算出的 P 值似乎太低了。谁能发现漏洞?
您似乎想要一个双面的选择,所以将 pnorm(-abs(z.at.time))
乘以二。即,做 2*pnorm(-abs(z.at.time))
.
我正在尝试计算 P 与从具有时变系数的 Cox PH 模型获得的点估计相关联的值。我编写的函数没有提供正确的 P 值。我将通过使用生存包中的 NCCTG 肺癌数据来说明这一点。
# Setup
require(survival)
# Effect of Karnofsky score, linear
fit <- coxph(Surv(time/365.25, status == 2) ~ ph.karno + tt(ph.karno),
lung, tt=function(x, t, ...) {x*t})
函数:
# Same function but now with a P-value in the output
calculate.timeDependentHazard.P <- function(model,time) {
index.1 <- which(names(model$coef)=="ph.karno")
index.2 <- which(names(model$coef)=="tt(ph.karno)")
coef <- model$coef[c(index.1,index.2)]
var <- rbind(c(model$var[index.1,index.1],model$var[index.1,index.2]),
c(model$var[index.2,index.1],model$var[index.2,index.2]))
var.at.time <- t(c(1,time)) %*% var %*% c(1,time)
hazard.at.time <- t(c(1,time)) %*% coef
lower.95 <- hazard.at.time - 1.96*sqrt(var.at.time)
upper.95 <- hazard.at.time + 1.96*sqrt(var.at.time)
z.at.time <- hazard.at.time/(sqrt(var.at.time))
p.value <- pnorm(-abs(z.at.time))
results <- c(exp(c(hazard.at.time,lower.95,upper.95)),p.value)
names(results) <- c("hazard ratio","95% lower","95% upper","P.value")
options(scipen = 999)
results
}
# Point estimates after 1.05*365.25 = 383.5 days of follow-up
calculate.timeDependentHazard.P(fit,1.05)
输出:
> calculate.timeDependentHazard.P(fit,1.05)
hazard ratio 95% lower 95% upper P.value
0.98913256 0.97654719 1.00188013 0.04721342
显然,P-值应该 >.05 但不知何故不是。通过这种方法计算出的 P 值似乎太低了。谁能发现漏洞?
您似乎想要一个双面的选择,所以将 pnorm(-abs(z.at.time))
乘以二。即,做 2*pnorm(-abs(z.at.time))
.