如何最好地创建 table 来显示多个结果的人口统计数据?
How to best create a table to display demographics for multiple outcomes?
我正在尝试提供 table 显示调查中询问的多个二分结果的人口统计信息。
这是我开始的例子:
df1 <- data.frame(ID=c(1,2,3,4,5,6),
blondehair=c(0,1,1,0,0,1),
ateapple=c(1,1,1,0,1,1),
righthanded=c(0,1,1,1,1,0),
agecategory=c(1,1,2,2,1,1),
educationcategory=c(1,1,2,2,1,1))
df1
table1 <- df1 %>% select(ateapple,agecategory,educationcategory)
colnames(table1) <- c("Percentage that ate apple", "Age Category","Education Level")
table1 %>% tbl_summary(by=`Percentage that ate apple`,
statistic = list(all_categorical() ~ "{p}%"),
missing_text = "Missing") %>%
add_overall(last=TRUE) %>%
modify_header(label ~ "Demographics")
table1
我正在考虑为我感兴趣的每个结果制作一个 tbl_summary,然后将它们合并在一起。但是,我不希望 table 同时显示结果的“0(否)”和“1(是)”类别。我只想得到有结果的百分比(即只显示确实吃苹果的百分比)。我的真实数据 table 有 10 个二分结果和 7 个分类人口统计变量,所以我有点犹豫要不要合并这么多个体 tbl_summary table。
这就是我想要得到的:
Percentage that ate an apple Percentage blonde Percentage right-handed
Age Category
1 67 % 33% 33%
2 17% 17% 33%
Education
1 33% 6% 33%
2 17% 3% 33%
R 是否有可以帮助解决这个问题的软件包?我正在考虑使用 tbl_summary,但我认为这不会给我想要的东西。
下面是一个如何获得所需内容的示例。
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.0'
# loop over to variables, the named portion is the label that we'll put in the header
tbl <-
list("Patients who responded" = "response",
"Patients who died" = "death") %>%
purrr::imap(
~ trial %>%
tbl_summary(
by = all_of(.x),
include = c(age, grade),
type = all_dichotomous() ~ "categorical"
) %>%
# update the column header
modify_header(all_stat_cols() ~ paste0("**", .y, "**")) %>%
# hide the first stat column (which is for the non-responders)
modify_column_hide(stat_1)
) %>%
tbl_merge() %>%
# remove spanning headers
modify_spanning_header(all_stat_cols() ~ NA)
#> 7 observations missing `response` have been removed. To include these observations, use `forcats::fct_explicit_na()` on `response` column before passing to `tbl_summary()`.
由 reprex package (v2.0.1)
创建于 2022-01-16
您可以使用 table1
包(免责声明:我是包的作者)。以下是使用您的数据的示例:
library(table1)
df1 <- data.frame(ID=c(1,2,3,4,5,6),
blondehair=c(0,1,1,0,0,1),
ateapple=c(1,1,1,0,1,1),
righthanded=c(0,1,1,1,1,0),
agecategory=c(1,1,2,2,1,1),
educationcategory=c(1,1,2,2,1,1))
# For dichotomous variables, transform to logical
df1$blondehair <- as.logical(df1$blondehair)
df1$ateapple <- as.logical(df1$ateapple)
df1$righthanded <- as.logical(df1$righthanded)
# For categorical variables, transform to factor
df1$agecategory <- factor(df1$agecategory)
df1$educationcategory <- factor(df1$educationcategory)
# Add labels
label(df1$blondehair) <- "Percentage with blond hair"
label(df1$ateapple) <- "Percentage that ate an apple"
label(df1$righthanded) <- "Percentage of right handed"
label(df1$agecategory) <- "Age Category"
label(df1$educationcategory) <- "Education Level"
rndr <- function(x, ...) {
y <- stats.apply.rounding(stats.default(x, ...), ...)
y <- lapply(y, getElement, "PCT") # Only percent
if (is.logical(x)) y$Yes else c("", y)
}
table1(~ blondehair + ateapple + righthanded + agecategory + educationcategory,
data=df1, render=rndr, overall="Response, %")
还有更多选项可以控制输出(试图准确理解您想要什么)。
编辑:修正错别字。
我正在尝试提供 table 显示调查中询问的多个二分结果的人口统计信息。
这是我开始的例子:
df1 <- data.frame(ID=c(1,2,3,4,5,6),
blondehair=c(0,1,1,0,0,1),
ateapple=c(1,1,1,0,1,1),
righthanded=c(0,1,1,1,1,0),
agecategory=c(1,1,2,2,1,1),
educationcategory=c(1,1,2,2,1,1))
df1
table1 <- df1 %>% select(ateapple,agecategory,educationcategory)
colnames(table1) <- c("Percentage that ate apple", "Age Category","Education Level")
table1 %>% tbl_summary(by=`Percentage that ate apple`,
statistic = list(all_categorical() ~ "{p}%"),
missing_text = "Missing") %>%
add_overall(last=TRUE) %>%
modify_header(label ~ "Demographics")
table1
我正在考虑为我感兴趣的每个结果制作一个 tbl_summary,然后将它们合并在一起。但是,我不希望 table 同时显示结果的“0(否)”和“1(是)”类别。我只想得到有结果的百分比(即只显示确实吃苹果的百分比)。我的真实数据 table 有 10 个二分结果和 7 个分类人口统计变量,所以我有点犹豫要不要合并这么多个体 tbl_summary table。
这就是我想要得到的:
Percentage that ate an apple Percentage blonde Percentage right-handed
Age Category
1 67 % 33% 33%
2 17% 17% 33%
Education
1 33% 6% 33%
2 17% 3% 33%
R 是否有可以帮助解决这个问题的软件包?我正在考虑使用 tbl_summary,但我认为这不会给我想要的东西。
下面是一个如何获得所需内容的示例。
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.0'
# loop over to variables, the named portion is the label that we'll put in the header
tbl <-
list("Patients who responded" = "response",
"Patients who died" = "death") %>%
purrr::imap(
~ trial %>%
tbl_summary(
by = all_of(.x),
include = c(age, grade),
type = all_dichotomous() ~ "categorical"
) %>%
# update the column header
modify_header(all_stat_cols() ~ paste0("**", .y, "**")) %>%
# hide the first stat column (which is for the non-responders)
modify_column_hide(stat_1)
) %>%
tbl_merge() %>%
# remove spanning headers
modify_spanning_header(all_stat_cols() ~ NA)
#> 7 observations missing `response` have been removed. To include these observations, use `forcats::fct_explicit_na()` on `response` column before passing to `tbl_summary()`.
您可以使用 table1
包(免责声明:我是包的作者)。以下是使用您的数据的示例:
library(table1)
df1 <- data.frame(ID=c(1,2,3,4,5,6),
blondehair=c(0,1,1,0,0,1),
ateapple=c(1,1,1,0,1,1),
righthanded=c(0,1,1,1,1,0),
agecategory=c(1,1,2,2,1,1),
educationcategory=c(1,1,2,2,1,1))
# For dichotomous variables, transform to logical
df1$blondehair <- as.logical(df1$blondehair)
df1$ateapple <- as.logical(df1$ateapple)
df1$righthanded <- as.logical(df1$righthanded)
# For categorical variables, transform to factor
df1$agecategory <- factor(df1$agecategory)
df1$educationcategory <- factor(df1$educationcategory)
# Add labels
label(df1$blondehair) <- "Percentage with blond hair"
label(df1$ateapple) <- "Percentage that ate an apple"
label(df1$righthanded) <- "Percentage of right handed"
label(df1$agecategory) <- "Age Category"
label(df1$educationcategory) <- "Education Level"
rndr <- function(x, ...) {
y <- stats.apply.rounding(stats.default(x, ...), ...)
y <- lapply(y, getElement, "PCT") # Only percent
if (is.logical(x)) y$Yes else c("", y)
}
table1(~ blondehair + ateapple + righthanded + agecategory + educationcategory,
data=df1, render=rndr, overall="Response, %")
还有更多选项可以控制输出(试图准确理解您想要什么)。
编辑:修正错别字。