R/gtsummary:排除 tbl_summary 中的异常值
R/gtsummary: excluding outliers in tbl_summary
是否可以让 gtsummary::tbl_summary
计算排除异常值的平均值?例如,在下面的代码中,我展示了一些 z 分数的样本数据。是否可以指定 gtsummary::tbl_summary
如何处理每一列?
set.seed(42)
n <- 1000
dat <- data.frame(id=1:n,
treat = factor(sample(c('Treat','Control'), n, rep=TRUE, prob=c(.5, .5))),
outcome1=runif(n, min=-3.6, max=2.3),
outcome2=runif(n, min=-1.9, max=3.3),
outcome3=runif(n, min=-2.5, max=2.8),
outcome4=runif(n, min=-3.1, max=2.2))
dat %>% select(-c(id)) %>% tbl_summary(by=treat, statistic = list(all_continuous() ~ "{mean} ({min} to {max})"))
例如,假设我希望 table 仅在 outcome1 >= -2.9
的情况下报告 outcome1
的平均值,并且仅在 outcome2 < 3.0
的情况下报告 outcome2
的平均值=]等
非常感谢您提供的任何指导。
您可以定义一个排除异常值的新均值函数。您可以按照自己喜欢的任何方式定义离群值。然后将该函数传递给 tbl_summary()
。示例如下!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'
set.seed(42)
n <- 1000
dat <- data.frame(id=1:n,
treat = factor(sample(c('Treat','Control'), n, rep=TRUE, prob=c(.5, .5))),
outcome1=runif(n, min=-3.6, max=2.3),
outcome2=runif(n, min=-1.9, max=3.3),
outcome3=runif(n, min=-2.5, max=2.8),
outcome4=runif(n, min=-3.1, max=2.2))
mean_no_extreme <- function(x) {
x <- na.omit(x)
sd <- sd(x)
mean <- mean(x)
# calculate mean excluding extremes
mean(x[x >= mean - sd * 3 & x <= mean + sd * 3])
}
dat %>%
select(-c(id)) %>%
tbl_summary(
by=treat,
statistic = all_continuous() ~ "{mean_no_extreme} ({min} to {max})"
) %>%
as_kable()
Characteristic
Control, N = 527
Treat, N = 473
outcome1
-0.64 (-3.59 to 2.30)
-0.70 (-3.60 to 2.30)
outcome2
0.68 (-1.89 to 3.30)
0.78 (-1.87 to 3.28)
outcome3
0.20 (-2.47 to 2.80)
0.23 (-2.48 to 2.80)
outcome4
-0.36 (-3.09 to 2.19)
-0.41 (-3.10 to 2.20)
由 reprex package (v2.0.1)
创建于 2022-03-22
是否可以让 gtsummary::tbl_summary
计算排除异常值的平均值?例如,在下面的代码中,我展示了一些 z 分数的样本数据。是否可以指定 gtsummary::tbl_summary
如何处理每一列?
set.seed(42)
n <- 1000
dat <- data.frame(id=1:n,
treat = factor(sample(c('Treat','Control'), n, rep=TRUE, prob=c(.5, .5))),
outcome1=runif(n, min=-3.6, max=2.3),
outcome2=runif(n, min=-1.9, max=3.3),
outcome3=runif(n, min=-2.5, max=2.8),
outcome4=runif(n, min=-3.1, max=2.2))
dat %>% select(-c(id)) %>% tbl_summary(by=treat, statistic = list(all_continuous() ~ "{mean} ({min} to {max})"))
例如,假设我希望 table 仅在 outcome1 >= -2.9
的情况下报告 outcome1
的平均值,并且仅在 outcome2 < 3.0
的情况下报告 outcome2
的平均值=]等
非常感谢您提供的任何指导。
您可以定义一个排除异常值的新均值函数。您可以按照自己喜欢的任何方式定义离群值。然后将该函数传递给 tbl_summary()
。示例如下!
library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'
set.seed(42)
n <- 1000
dat <- data.frame(id=1:n,
treat = factor(sample(c('Treat','Control'), n, rep=TRUE, prob=c(.5, .5))),
outcome1=runif(n, min=-3.6, max=2.3),
outcome2=runif(n, min=-1.9, max=3.3),
outcome3=runif(n, min=-2.5, max=2.8),
outcome4=runif(n, min=-3.1, max=2.2))
mean_no_extreme <- function(x) {
x <- na.omit(x)
sd <- sd(x)
mean <- mean(x)
# calculate mean excluding extremes
mean(x[x >= mean - sd * 3 & x <= mean + sd * 3])
}
dat %>%
select(-c(id)) %>%
tbl_summary(
by=treat,
statistic = all_continuous() ~ "{mean_no_extreme} ({min} to {max})"
) %>%
as_kable()
Characteristic | Control, N = 527 | Treat, N = 473 |
---|---|---|
outcome1 | -0.64 (-3.59 to 2.30) | -0.70 (-3.60 to 2.30) |
outcome2 | 0.68 (-1.89 to 3.30) | 0.78 (-1.87 to 3.28) |
outcome3 | 0.20 (-2.47 to 2.80) | 0.23 (-2.48 to 2.80) |
outcome4 | -0.36 (-3.09 to 2.19) | -0.41 (-3.10 to 2.20) |
由 reprex package (v2.0.1)
创建于 2022-03-22