datasummary：将因子和数值变量组合在一个 table 中

Question

我正在尝试使用 modelsummary 创建一个包含因子和数值变量的 table。我这样做的方法是将因子变量转换为数字，以便每个因子变量只出现 1 行，并且所有变量都出现在同一列中。然后，我将手动计算每个先前 factor/now 数值变量的每个级别的单位数，并将其作为文本分配给数据集中的每个变量。我正在尝试根据下面示例中名为 N_alt 的函数执行此操作：

library(modelsummary)
library(kableExtra)

tmp <- mtcars[, c("mpg", "hp")]

tmp$class <- 0
tmp$class[15:32] <- 1
tmp$class <- as.factor(tmp$class)

tmp$region <- 1
tmp$region[15:20] <- 2
tmp$region[21:32] <- 3
tmp$region <- as.factor(tmp$region)

tmp$class <- 0
tmp$region <- 0

N_alt = function(x) {
  if (x %in% c(tmp$class)) {
    paste0('[14 (43.8); 18 (56.3)]') 
  } else if (x %in% c(tmp$region)) {
    paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')  
  } else {
    paste0('[32 (100)]')
  }
}


# create a table with `datasummary`
emptycol = function(x) " "
datasummary(mpg + (`class [0,1]`= class) + (`region [A,B,C]`= region) + hp ~ Heading("N (%)") * N_alt, data = tmp)

这给了我：

我的N_alt功能不能正常工作。 class 是正确的，但 region 不是。我没有收到任何警告消息。

我也试过：

N_alt = function(x) {
  if (x[1] %in% c(tmp$class)) {
    paste0('[14 (43.8); 18 (56.3)]') 
  } else if (x[1] %in% c(tmp$region)) {
    paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')  
  } else {
    paste0('[32 (100)]')
  }
}

但我得到了相同的输出。我用这些向量创建了类似的函数，它们运行良好，但由于某种原因，这个函数不起作用。

另外，我也试过：

N_alt <- c('[32 (100)]','[14 (43.8); 18 (56.3)]','[14 (43.8); 6 (18.8); 12 (37.5)]','[32 (100)]')

和

N_alt <- c(rep('[32 (100)]',32),rep('[14 (43.8); 18 (56.3)]',32),rep('[14 (43.8); 6 (18.8); 12 (37.5)]',32),rep('[32 (100)]',32))

但我得到：

Error in datasummary(mpg + (`class [0,1]` = class) + (`region [A,B,C]` = region) +  : 
  Argument 'N_alt' is not length 32

有人知道我在这里遗漏了什么吗？

编辑：

运行函数似乎可以像下面的 Mean_alt 这样某些数字变量没有小数位（只是将它们转换为 as.integer 对me) 和以前的 factor/now 数字变量在 table（两个不同的操作）中不显示任何结果，如下所示：

library(modelsummary)
library(kableExtra)

tmp <- mtcars[, c("mpg", "hp")]

tmp$class <- 0
tmp$class[15:32] <- 1
tmp$class <- as.factor(tmp$class)

tmp$region <- 1
tmp$region[15:20] <- 2
tmp$region[21:32] <- 3
tmp$region <- as.factor(tmp$region)

tmp$class <- 0
tmp$region <- 0

N_alt = function(x) {
  if (x %in% c(tmp$class)) {
    paste0('[14 (43.8); 18 (56.3)]') 
  } else if (x %in% c(tmp$region)) {
    paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')  
  } else {
    paste0('[32 (100)]')
  }
}

Mean_alt = function(x) {
  if (x %in% c(tmp$mpg)) {
    as.character(floor(mean(x)), length=5)
  } else if (x %in% c(tmp$class, tmp$region)) {
    paste0("")
  } else {
    mean(x)
  }
}

# create a table with `datasummary`
emptycol = function(x) " "
datasummary(mpg + (`class [0,1]`= class) + (`region [A,B,C]`= region) + hp ~ Heading("N (%)") * N_alt + Heading("Mean") * Mean_alt, data = tmp)

输出：

Answer 1

您运行违反了三个限制。

第一个限制在 Base R:

As explained in the R manual，if/else 中的语句必须计算为单个 TRUE 或 FALSE。在内部，datasummary 会将 N_alt 一个接一个地应用于每个变量。每次，N_alt 都会收到一个长度为 32 的新向量。坦率地说，我认为检查该向量第一个元素的值没有多大意义；我不明白这如何能让我们到达我们想去的地方。

另外两个限制与 tables 包的基本设计有关，modelsummary::datasummary 是基于该包的：

因子将始终在每个因子水平生成一行。
我认为没有什么好方法可以告诉 datasummary 一个函数在应用于不同的数字变量时应该有不同的行为。这是因为每个函数只看到原始数字向量，而不是其他元信息。

我认为最简单的解决方法是创建两个表，一个用于您的因子，一个用于您的数值。然后，这些表可以很容易地组合起来：

library(modelsummary)

N_factor <- function(x) {
  count <- table(x)
  pct <- prop.table(count)
  out <- paste(sprintf("%.0f (%.1f)", count, pct), collapse = "; ")
  sprintf("[%s]", out)
}

N_numeric <- function(x) {
  sprintf("%s (100)", length(x))
}

tab_fac <- datasummary(cyl + gear ~ Heading("N") * N_factor, 
                       output = "data.frame",
                       data = mtcars)

datasummary(mpg + hp ~ Heading("N") * N_numeric, 
            add_rows = tab_fac,
            data = mtcars)

	N
mpg	32 (100)
hp	32 (100)
cyl	[11 (0.3); 7 (0.2); 14 (0.4)]
gear	[15 (0.5); 12 (0.4); 5 (0.2)]

datasummary：将因子和数值变量组合在一个 table 中

datasummary: Combine factor and numeric variables in a single table

r

function

tables-package

modelsummary