R/gtsummary:排除 tbl_summary 中的异常值

R/gtsummary: excluding outliers in tbl_summary

是否可以让 gtsummary::tbl_summary 计算排除异常值的平均值?例如,在下面的代码中,我展示了一些 z 分数的样本数据。是否可以指定 gtsummary::tbl_summary 如何处理每一列?

set.seed(42)
n <- 1000
dat <- data.frame(id=1:n,
                  treat = factor(sample(c('Treat','Control'), n, rep=TRUE, prob=c(.5, .5))),
                  outcome1=runif(n, min=-3.6, max=2.3),
                  outcome2=runif(n, min=-1.9, max=3.3),
                  outcome3=runif(n, min=-2.5, max=2.8),
                  outcome4=runif(n, min=-3.1, max=2.2))
dat %>% select(-c(id)) %>% tbl_summary(by=treat, statistic = list(all_continuous() ~ "{mean} ({min} to {max})")) 

例如,假设我希望 table 仅在 outcome1 >= -2.9 的情况下报告 outcome1 的平均值,并且仅在 outcome2 < 3.0 的情况下报告 outcome2 的平均值=]等

非常感谢您提供的任何指导。

您可以定义一个排除异常值的新均值函数。您可以按照自己喜欢的任何方式定义离群值。然后将该函数传递给 tbl_summary()。示例如下!

library(gtsummary)
packageVersion("gtsummary")
#> [1] '1.5.2'

set.seed(42)
n <- 1000
dat <- data.frame(id=1:n,
                  treat = factor(sample(c('Treat','Control'), n, rep=TRUE, prob=c(.5, .5))),
                  outcome1=runif(n, min=-3.6, max=2.3),
                  outcome2=runif(n, min=-1.9, max=3.3),
                  outcome3=runif(n, min=-2.5, max=2.8),
                  outcome4=runif(n, min=-3.1, max=2.2))

mean_no_extreme <- function(x) {
  x <- na.omit(x)
  sd <- sd(x)
  mean <- mean(x)
  
  # calculate mean excluding extremes
  mean(x[x >= mean - sd * 3 & x <= mean + sd * 3])
}


dat %>% 
  select(-c(id)) %>% 
  tbl_summary(
    by=treat, 
    statistic = all_continuous() ~ "{mean_no_extreme} ({min} to {max})"
  ) %>%
  as_kable()
Characteristic Control, N = 527 Treat, N = 473
outcome1 -0.64 (-3.59 to 2.30) -0.70 (-3.60 to 2.30)
outcome2 0.68 (-1.89 to 3.30) 0.78 (-1.87 to 3.28)
outcome3 0.20 (-2.47 to 2.80) 0.23 (-2.48 to 2.80)
outcome4 -0.36 (-3.09 to 2.19) -0.41 (-3.10 to 2.20)

reprex package (v2.0.1)

创建于 2022-03-22