如何最好地创建 table 来显示多个结果的人口统计数据?

How to best create a table to display demographics for multiple outcomes?

我正在尝试提供 table 显示调查中询问的多个二分结果的人口统计信息。

这是我开始的例子:

df1 <- data.frame(ID=c(1,2,3,4,5,6),
                  blondehair=c(0,1,1,0,0,1),
                  ateapple=c(1,1,1,0,1,1),
                  righthanded=c(0,1,1,1,1,0),
                  agecategory=c(1,1,2,2,1,1),
                  educationcategory=c(1,1,2,2,1,1))
df1

table1 <- df1 %>% select(ateapple,agecategory,educationcategory)
colnames(table1) <- c("Percentage that ate apple", "Age Category","Education Level")

table1 %>% tbl_summary(by=`Percentage that ate apple`,
                       statistic = list(all_categorical() ~ "{p}%"),
                                 missing_text = "Missing") %>%
  add_overall(last=TRUE) %>%
  modify_header(label ~ "Demographics")
table1

我正在考虑为我感兴趣的每个结果制作一个 tbl_summary,然后将它们合并在一起。但是,我不希望 table 同时显示结果的“0(否)”和“1(是)”类别。我只想得到有结果的百分比(即只显示确实吃苹果的百分比)。我的真实数据 table 有 10 个二分结果和 7 个分类人口统计变量,所以我有点犹豫要不要合并这么多个体 tbl_summary table。

这就是我想要得到的:

      Percentage that ate an apple    Percentage blonde   Percentage right-handed 
Age Category     
1              67 %                          33%                         33%     
2             17%                            17%                         33%

Education
1        33%                           6%                           33%
2        17%                           3%                           33%

R 是否有可以帮助解决这个问题的软件包?我正在考虑使用 tbl_summary,但我认为这不会给我想要的东西。

下面是一个如何获得所需内容的示例。

library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.0'

# loop over to variables, the named portion is the label that we'll put in the header
tbl <- 
  list("Patients who responded" = "response", 
     "Patients who died" = "death") %>%
  purrr::imap(
    ~ trial %>%
      tbl_summary(
        by = all_of(.x), 
        include = c(age, grade),
        type = all_dichotomous() ~ "categorical"
      ) %>%
      # update the column header
      modify_header(all_stat_cols() ~ paste0("**", .y, "**")) %>%
      # hide the first stat column (which is for the non-responders)
      modify_column_hide(stat_1)
  ) %>% 
  tbl_merge() %>%
  # remove spanning headers
  modify_spanning_header(all_stat_cols() ~ NA)
#> 7 observations missing `response` have been removed. To include these observations, use `forcats::fct_explicit_na()` on `response` column before passing to `tbl_summary()`.

reprex package (v2.0.1)

创建于 2022-01-16

您可以使用 table1 包(免责声明:我是包的作者)。以下是使用您的数据的示例:

library(table1)

df1 <- data.frame(ID=c(1,2,3,4,5,6),
                  blondehair=c(0,1,1,0,0,1),
                  ateapple=c(1,1,1,0,1,1),
                  righthanded=c(0,1,1,1,1,0),
                  agecategory=c(1,1,2,2,1,1),
                  educationcategory=c(1,1,2,2,1,1))

# For dichotomous variables, transform to logical
df1$blondehair  <- as.logical(df1$blondehair)
df1$ateapple    <- as.logical(df1$ateapple)
df1$righthanded <- as.logical(df1$righthanded)

# For categorical variables, transform to factor
df1$agecategory       <- factor(df1$agecategory)
df1$educationcategory <- factor(df1$educationcategory)

# Add labels
label(df1$blondehair)        <- "Percentage with blond hair"
label(df1$ateapple)          <- "Percentage that ate an apple"
label(df1$righthanded)       <- "Percentage of right handed"
label(df1$agecategory)       <- "Age Category"
label(df1$educationcategory) <- "Education Level"

rndr <- function(x, ...) {
    y <- stats.apply.rounding(stats.default(x, ...), ...)
    y <- lapply(y, getElement, "PCT")  # Only percent
    if (is.logical(x)) y$Yes else c("", y)
}

table1(~ blondehair + ateapple + righthanded + agecategory + educationcategory, 
    data=df1, render=rndr, overall="Response, %")

还有更多选项可以控制输出(试图准确理解您想要什么)。

编辑:修正错别字。