如何将 mimira 对象(Cox 回归模型,来自多重插补和倾向得分匹配 (MatchThem pkg))转换为森林图

How to convert a mimira object (Cox regression model, from multiple imputations and a propensity score matching (MatchThem pkg)) into a Forest plot

亲爱的 Whosebug 社区, 作为一名外科医生,满怀热情地自学了6个月的R学习(Whosebug,还有那么多网站),请大家原谅我的顾虑。

背景: 简而言之,我的 objective 是 运行 癌症患者数据集的生存 cox 模型回归。由于追溯方面,我打算用倾向得分匹配(PSM)做一个匹配1:3。缺失的数据处理了多重插补(“小鼠”pkg)。 PSM 使用“MatchThem”pkg 进行管理。 我使用“调查”pkg 来合并生存(svycoxph() 通过 with() 函数合并)。这将我们引向一个 mimira 对象,我可以使用 tbl_regression ("gtsummary" pkg).

轻松地将其打印成漂亮的 Table

问题: 作为通常将我的 cox 回归打印成风险比 Table 和图形版本(带有 ggforest() 的森林图,来自“survminer”pkg),这次我真的被卡住了。 ggforest 函数无法将 mimira 对象识别为“coxph 对象”并发送此错误:

Error in ggforest(tbl_regression_object, data = mimira_object) : 
  inherits(model, "coxph") is not TRUE 

我想问题是将 PSM 添加到我的多重插补中,因为我用 Forest plot 打印多重插补的 cox 回归没有问题(ggforest 能够毫无问题地处理 mira 对象 pool_and_tidy_mice () 函数).

这是脚本:

#Data
library(fabricatr)
library(simsurv)

# Simulate patient data in a clinical trial
participant_data <- fabricate(
  N = 2000,
  age = runif(N, min = 18, max = 85),
  is_female = draw_binary(prob = 0.5, N = N),
  is_smoker = draw_binary(prob = 0.2 + 0.2 * (age > 50), N = N),
  disease_stage = round(runif(N, min = 1 + 0.5 * (age > 65), max = 4)),
  treatment = draw_binary(prob = 0.5, N = N),
  kps = runif(N, min = 40, max = 100)
)

# Simulate data in the survival context
survival_data <- simsurv(
  lambdas = 0.1, gammas = 1.8,
  x = participant_data, 
  betas = c(is_female = -0.2, is_smoker = 1.2,
            treatment = -0.4, kps = -0.005,
            disease_stage = 0.2),
  maxt = 5)

# Merging df
library(dplyr)
mydata_complete <- bind_cols(survival_data, participant_data)

# generating missing value
library(missMethods)
mydata_uncomp <- delete_MCAR(mydata_complete, 0.3)
mydata <- mydata_uncomp

#1 imputation with "mice"
library(mice)
mydata$nelsonaalen <- nelsonaalen(mydata, eventtime, status)
mydata_mice_imp_m3 <- mice(mydata, maxit = 2, m = 3, seed = 20200801) # m=3 is for testing


#2 matching (PSM 1:3) with "MatchThem"
library(MatchThem)
mydata_imp_m3_psm <- matchthem(treatment ~ age + is_female + disease_stage, data = mydata_mice_imp_m3, approach = "within" ,ratio= 1, method = "optimal")

#3 Pooling Coxph models in multiple imputed datasets and PSM with "survey"
library(survey)
mimira_object <- with(data = mydata_imp_m3_psm, expr = svycoxph(Surv(eventtime, status) ~ age+ is_smoker + disease_stage))
pool_and_tidy_mice(mimira_object, exponentiate = TRUE, conf.int=TRUE) -> pooled_imp_m3_cph

    # estimates with pool_and_tidy_mice() works with mimira_object but cannot bring me de degree of freedoms. Warning message :
In get.dfcom(object, dfcom) : Infinite sample size assumed.
> pooled_imp_m3_cph
           term  estimate   std.error  statistic p.value conf.low conf.high            b  df dfcom fmi    lambda m      riv         ubar
1           age 0.9995807 0.001961343 -0.2138208     NaN      NaN       NaN 1.489769e-06 NaN   Inf NaN 0.5163574 3 1.067643 1.860509e-06
2     is_smoker 2.8626952 0.093476026 11.2516931     NaN      NaN       NaN 4.182884e-03 NaN   Inf NaN 0.6382842 3 1.764601 3.160589e-03
3 disease_stage 1.2386947 0.044092483  4.8547535     NaN      NaN       NaN 8.995628e-04 NaN   Inf NaN 0.6169374 3 1.610540 7.447299e-04

#4 Table summary of the pooled results
library(gtsummary)
tbl_regression_object <- tbl_regression(mimira_object, exp=TRUE, conf.int = TRUE) # 95% CI and p-value are missing due to an issue with an other issue in the pooling of the mimira_object. The Matchthem:::get.2dfcom function gives a dfcom = 999999 (another issue to be solved in my concern)


#5 What it should looks like as graphical summary
library(survival)
mydata.cox <- coxph(Surv(eventtime, status) ~ age+ is_smoker + disease_stage, mydata_uncomp) # (df mydata_uncomp is without imputation and PSM)

    #with gtsummary
forestGT <- 
  mydata.cox %>%
  tbl_regression(exponentiate = TRUE,
                 add_estimate_to_reference_rows = TRUE) %>% 
  plot()
(forestGT) # See picture GT_plot1. Almost perfect. Would have been great to know how to add N, 95% CI, HR, p-value and parameters of the model (AIC, events, concordance, etc.)

    #with survminer
HRforest <- 
  survminer::ggforest(mydata.cox, data = mydata_uncomp)
(HRforest) # See picture Ggforest. Everything I need to know about my cox regression is all in there. For me it is just a great regression cox forest plot.

#6 Actually what happens when I do the same thing with imputed and matched df

    #with gtsummary
forestGT_imp_psm <- 
  mimira_object %>%
  tbl_regression(exponentiate = TRUE,
                 add_estimate_to_reference_rows = TRUE) %>% 
  plot() # WARNING message : In get.dfcom(object, dfcom) : Infinite sample size assumed.
(forestGT_imp_psm) # See picture GT_plot2. The plot is rendered but without 95% IC

    #with survminer
HRforest_imp_psm <- 
  ggforest(mimira_object, data = mydata_imp_m3_psm) # ERROR:in ggforest(mimira_object, data = mydata_imp_m3_psm) : inherits(model, "coxph") is not TRUE
(HRforest_imp_psm)

#7 The lucky and providential step
# your solution/advise

非常感谢您的帮助。

干杯。

AK

图片GT_plot1 (不允许在此 post 中嵌入图像,这里是共享链接:GT_plot1

图片Ggforest_plot Ggforest_plot

图片GT_plot2 GT_plot2

如果您提供一个可重现的示例(即我们都可以 运行 在我们的机器上使用的数据集示例),我们可以更好地帮助您。

gtsummary 包导出了一个 plot() 方法,您可以使用它来构建森林图。示例如下!

library(gtsummary)
library(survival)

ggforest <- 
  coxph(Surv(ttdeath, death) ~ trt + grade, trial) %>%
  tbl_regression(exponentiate = TRUE,
                 add_estimate_to_reference_rows = TRUE) %>%
  plot()
#> Registered S3 method overwritten by 'GGally':
#>   method from   
#>   +.gg   ggplot2
ggforest

reprex package (v2.0.1)

于 2021-08-26 创建

看来这里有两个截然不同的问题:

问题#1。让 gtsummary() 生成一个 table,其中包含合并的匹配数据的 p 值和置信区间

问题#2。生成 ggforest() 以生成汇总估计值图。

问题 #1:

让我们按照论文“MatchThem::多重插补后的匹配和加权”(https://arxiv.org/ftp/arxiv/papers/2009/2009.11772.pdf)[第15页]

中的说明进行操作

并修改您的块 #3。我们不调用 pool_and_tidy_mice(),而是执行以下操作:

matched.results <- pool(mimira_object)
summary(matched.results, conf.int = TRUE)

这会产生以下结果:

           term      estimate   std.error  statistic        df      p.value        2.5 %     97.5 %
1           age -0.0005997864 0.001448251 -0.4141453 55.266353 6.803707e-01 -0.003501832 0.00230226
2     is_smoker  1.1157796620 0.077943244 14.3152839  9.961064 5.713387e-08  0.942019234 1.28954009
3 disease_stage  0.2360965310 0.051799813  4.5578645  3.879879 1.111782e-02  0.090504018 0.38168904

这意味着使用 mice 执行插补然后与 MatchThem 匹配是可行的,因为您确实获得了 p 值和置信区间。

pool_and_tidy_mice() 的输出比较:

           term      estimate   std.error  statistic p.value            b  df dfcom fmi    lambda m
1           age -0.0005997864 0.001448251 -0.4141453     NaN 2.992395e-07 NaN   Inf NaN 0.1902260 3
2     is_smoker  1.1157796620 0.077943244 14.3152839     NaN 2.041627e-03 NaN   Inf NaN 0.4480827 3
3 disease_stage  0.2360965310 0.051799813  4.5578645     NaN 1.444843e-03 NaN   Inf NaN 0.7179644 3
        riv         ubar
1 0.2349124 1.698446e-06
2 0.8118657 3.352980e-03
3 2.5456522 7.567636e-04

这里除了df和p.value都一样,后者没有计算table.

因此,我认为这是 pool_and_tidy_mice() 的一个问题,您应该 post 在 gtsummary GitHub 上将其作为一个问题。

目前,您可以通过在调用 with() 函数时将块 #3 中的 svycoxph() 更改为 survival::coxph() 来绕过此问题。如果你这样做,那么最终你会得到一个带有 p.values 和置信区间的 gtsummary table。最终,问题可能是 svycoxph()pool_and_mice() 之间的某种交互,因此我认为您应该 post 在 GitHub.

上这样做

问题 #2:

简短的回答是,不可能有包含您要查找的所有数据的 ggforest 图。

https://www.rdocumentation.org/packages/mice/versions/3.13.0/topics/pool 读作:

一个常见的错误是将步骤 2 和步骤 3 颠倒过来,即 合并乘法插补数据而不是估计值 。这样做可能会严重影响科学兴趣的估计,并产生不正确的统计区间和 p 值。 pool() 函数将检测这种情况。

这意味着合并估计没有“真实”数据集(即您无法真正组合插补 1-3 的数据集),这意味着 ggforest() 无法计算所需的图(因为它需要有一个数据集,但不能使用,因为它会导致错误的估计)。

您可以做的是为每个插补显示所有 ggforest 图(因此,如果您进行了 3 次插补,您将得到 3 个略有不同的 ggforest 图),最后使用 plot() 添加汇总估计图如上所述。

要创建每个 ggforest 图,您需要以下代码行:

ggforest(mimira_object$analyses[[1]], complete(mydata_imp_m3_psm, 1))

这将为您的第一个插补创建 ggforest 图。将数字更改为 2 和 3 以检查剩余的插补。

希望对您有所帮助,

亚历克斯