tbl_summary 可以同时显示变量的级别和子级别（即汇总统计列）吗？

Question

我正在用 tbl_summary 制作 table，我想知道是否可以按净度级别对 diamond_cat 标签进行细分，同时保留两者的摘要统计信息？我将附上我设想的 table 的图像：

library(gtsummary)
library(forcats)
data(diamonds)
table(diamonds$clarity)
# I1   SI2   SI1   VS2   VS1  VVS2 
# 741  9194 13065 12258  8171  5066 
# VVS1    IF 
# 3655  1790 

diamond_cat <- fct_collapse(diamonds$clarity,
                            "Internally flawless" = "IF",
                            "Very very slightly included" = c("VVS1", "VVS2"),
                            "Very slightly included" = c("VS1", "VS2"),
                            "Slightly included" = c("SI1", "SI2"),
                            "Included" = "I1")
# add new variable to data set
diamonds$diamond_cat <- diamond_cat

diamonds %>% select(diamond_cat) %>% tbl_summary()
#diamonds %>% select(clarity) %>% tbl_summary()

我是 R 的新手。在此先感谢您的帮助。

Answer 1

是的，这是可能的。但是，tbl_summary() 并没有考虑到这个功能，所以让它工作的代码很复杂。示例如下！

library(gtsummary)

# save recodes because they will be used more than once
clarity_recodes <-
  list(
    "Internally flawless" = "IF",
    "Very very slightly included" = c("VVS1", "VVS2"),
    "Very slightly included" = c("VS1", "VS2"),
    "Slightly included" = c("SI1", "SI2"),
    "Included" = "I1"
  )

# build typical tbl_summary with the recoded clarity data
tbl1 <-
  ggplot2::diamonds %>%
  mutate(
    clarity_cat = forcats::fct_collapse(clarity, !!!clarity_recodes)
  ) %>%
  select(clarity_cat) %>%
  tbl_summary(label = clarity_cat ~ "Diamond Clarity")

# create a tibble of recoded summary stats
tbl2 <-
  ggplot2::diamonds %>%
  select(clarity) %>%
  tbl_summary() %>%
  modify_column_unhide(c(row_type)) %>%
  as_tibble(col_labels = FALSE) %>%
  dplyr::left_join(
    clarity_recodes %>% 
      tibble::enframe("label2", "label") %>% 
      tidyr::unnest(cols = c(label)),
    by = "label"
  ) %>%
  dplyr::with_groups(label2, ~dplyr::filter(.x, row_type == "level", dplyr::n() > 1)) %>%
  mutate(row_type = "double_indent") %>%
  tidyr::nest(data = -c(label2)) %>%
  dplyr::rename(label = label2)

# merge in the tibble with clarity details into the larger summary table
tbl_final <-
  tbl1 %>%
  modify_table_body(
    ~.x %>%
      dplyr::left_join(tbl2, by = c("label")) %>%
      dplyr::mutate(
        data =
          purrr::pmap(
            list(data, row_type, label, stat_0), 
            function(data, row_type, label, stat_0) {
              df <- tibble::tibble(
                row_type = row_type, 
                label = label,
                stat_0 = stat_0
              )
              if (!is.null(data)) return(dplyr::bind_rows(df, data))
              else return(df)
            }
          )
      ) %>%
      select(-row_type, -label, -stat_0) %>%
      tidyr::unnest(data)
  ) %>%
  modify_table_styling(
    columns = c(label, stat_0),
    rows = row_type == "double_indent",
    text_format = "indent2"
  ) %>%
  modify_table_styling(
    columns = stat_0,
    align  = "left"
  )

^{由 reprex package (v2.0.1)}

创建于 2021-08-21

tbl_summary 可以同时显示变量的级别和子级别（即汇总统计列）吗？

Can tbl_summary display both levels and sub-levels of a variable (i.e., for summary stats column)?

r

gtsummary