如何使用 gtsummary::tbl_svysummary() 显示因子变量水平的置信区间？

Question

我正在使用国家电子伤害监测系统 (https://www.cpsc.gov/Research--Statistics/NEISS-Injury-Data) 的调查数据来研究消费品伤害的趋势。

使用 gtsummary 和 tbl_svysummary()，我的目标是创建一个描述性的 table 伤害汇总测量。由于这是调查数据，我想显示与每个汇总度量相关联的 95% 置信区间。

之前的 post 提供了为两个水平因子变量 (Using (gtsummary) tbl_svysummaary function to display confidence intervals for survey.design object?) 生成置信区间的解决方案，但是，我正在寻找一种解决方案来为 >= 的因子变量生成置信区间2 个级别。

我借用了之前的一个可重现的例子post:

library(gtsummary)
library(survey)

svy_trial <-
  svydesign(~1, data = trial %>% select(trt, response, death), weights = ~1) 

ci <- function(variable, by, data, ...) {
  svyby(as.formula( paste0( "~" , variable)) , by = as.formula( paste0( "~" , by)), data, svyciprop, vartype="ci") %>%
    tibble::as_tibble() %>%
    dplyr::mutate_at(vars(ci_l, ci_u), ~style_number(., scale = 100) %>% paste0("%")) %>%
    dplyr::mutate(ci = stringr::str_glue("{ci_l}, {ci_u}")) %>%
    dplyr::select(all_of(c(by, "ci"))) %>%
    tidyr::pivot_wider(names_from = all_of(by), values_from = ci) %>%
    set_names(paste0("add_stat_", seq_len(ncol(.))))
}

ci("response", "trt", svy_trial)
#> # A tibble: 1 x 2
#>   add_stat_1 add_stat_2
#>   <glue>     <glue>    
#> 1 21%, 40%   25%, 44%  

svy_trial %>%
  tbl_svysummary(by = "trt", missing = "no") %>%
  add_stat(everything() ~ "ci") %>%
  modify_table_body(
    dplyr::relocate, add_stat_1, .after = stat_1
  ) %>%
  modify_header(starts_with("add_stat_") ~ "**95% CI**") %>%
  modify_footnote(everything() ~ NA)

Table 之前的截图 post 1

在上面的例子中，因子变量有两个水平，显示了第一个水平的汇总数据。

如何调整上述方法，以便显示两个水平的因子变量及其各自的置信区间？
如何将此解决方案推广到具有 >2 个级别的因子变量（例如，年龄变量分箱如下：<18 岁、18-25 岁、26-50 岁等）？
最后，这个理想的解决方案如何能够在同一列中为因子变量的置信区间生成连续变量的置信区间？

这是我尝试制作的 table 示例：所需 table 输出的屏幕截图2

如果此帮助请求没有遵循良好的堆栈溢出礼仪，我们深表歉意（我是这个社区的新手），非常感谢您的时间和帮助！

Answer 1

我有一个为 >=2 水平的因素准备的示例，但没有 by= 变量（尽管方法相似）。仅供参考，我们有一个未解决的问题，可以使用新函数 add_ci.tbl_svysummary() 更彻底地支持调查对象，该函数将为分类变量和连续变量计算 CI。您可以单击此处的“订阅”link 以在实施此功能时得到提醒https://github.com/ddsjoberg/gtsummary/issues/965

同时，这里有一个代码示例：

library(gtsummary)
library(tidyverse)
packageVersion("gtsummary")
#> [1] '1.5.0'

svy <- survey::svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) 

# put the CI in a tibble with the variable name
# first create a data frame with each variable and it's values
df_result <- 
  tibble(variable = c("Class", "Sex", "Age", "Survived")) %>%
  # get the levels of each variable in a new column
  # adding them as a list to allow for different variable classes
  rowwise() %>%
  mutate(
    # level to be used to construct call
    level = unique(svy$variables[[variable]]) %>% as.list() %>% list(),
    # character version to be merged into table
    label = unique(svy$variables[[variable]]) %>% as.character() %>% as.list() %>% list()
  ) %>%
  unnest(c(level, label)) %>%
  mutate(
    label = unlist(label)
  )

# construct call to svyciprop
df_result$svyciprop <-
  map2(
    df_result$variable, df_result$label,
    function(variable, level) rlang::inject(survey::svyciprop(~I(!!rlang::sym(variable) == !!level), svy))
  )


# round/format the 95% CI
df_result <-
  df_result %>%
  rowwise() %>%
  mutate(
    ci = 
      svyciprop %>%
      attr("ci") %>%
      style_sigfig(scale = 100) %>%
      paste0("%", collapse = ", ")
  ) %>% 
  ungroup() %>%
  # keep variables needed in tbl
  select(variable, label, ci)


# construct gtsummary table with CI
tbl <- 
  svy %>%
  tbl_svysummary() %>%
  # merge in CI
  modify_table_body(
    ~.x %>%
      left_join(
        df_result, 
        by = c("variable", "label")
      )
  ) %>%
  # add a header
  modify_header(ci = "**95% CI**")

^{由 reprex package (v2.0.1)}

于 2021-12-04 创建

如何使用 gtsummary::tbl_svysummary() 显示因子变量水平的置信区间？

How to use gtsummary::tbl_svysummary() to display confidence intervals for levels of a factor variable?

r

gtsummary