Modelsummary:来自 add_rows 的估计和额外行的不同格式

Modelsummary: Different formats for estimates and extra rows from add_rows

这个问题在本质上与 类似,但涉及使用(优秀的)modelsummary 包的 add_rows 函数创建的“附加行”的格式。据我所知,这些不能以类似的方式格式化(但希望我没有遗漏一些基本的东西!)。这是一个简单的可重现示例。

set.seed(03222022)
N <- 10^4
x <- rnorm(N)
y <- 0.000002*x + rnorm(N)

modelsummary(lm(y ~ x),
             fmt = 5,
             add_rows = tibble("term" = "Number of clusters",
                               "value" = 1000),
             output = "markdown")

生产:

|                   |  Model 1   |
|:------------------|:----------:|
|(Intercept)        |  0.00062   |
|                   | (0.01005)  |
|x                  |  -0.00885  |
|                   | (0.01007)  |
|Num.Obs.           |   10000    |
|R2                 |   0.000    |
|R2 Adj.            |   0.000    |
|AIC                |  28491.0   |
|BIC                |  28512.6   |
|Log.Lik.           | -14242.496 |
|F                  |   0.772    |
|Number of clusters | 1000.00000 |

当然,我可以将 1000 括在引号中并将其打印为字符作为快速修复(更一般地说,在我的实际案例中将这些值打印为字符 --- 我正在添加一个无法识别的拟合优度手动统计)。例如,我正在做类似的事情来避免这个问题:

clusters <- sample(1:1000, N, replace = TRUE)
z <- rnorm(N)
df <- cbind.data.frame(y, x, z, clusters)

m1 <- lm_robust(y ~ x,
          clusters = clusters,
          data = df)

m2 <- lm_robust(y ~ z,
                clusters = clusters,
                data = df)

models <- list(m1, m2)

modelsummary(models,
             fmt = 5,
             add_rows = as_tibble(t(
               as.character(sapply(models, function(x) x$nclusters)))) %>%
               add_column(term = "Number of clusters") %>%
               relocate(term),
             output = "markdown")

但我想知道是否有更好的方法来做到这一点?我真的很喜欢它与 gof_map 一起使用的方式,我可以在其中添加格式设置功能,如下所示:"fmt" = function(x) format(round(x, 2), big.mark=",").

在此先感谢您的帮助!

目前,fmt参数仅适用于估计和统计数据,gof_map参数仅适用于goodness-of-fit由modelsummary自动提取的统计数据。

你的想法很有意思,我也试着想过用什么样的user-interface来实现。然而,我提出的所有想法都没有我在下面粘贴的超简单基础 R 代码优雅。

如果您对 user-interface 有更好的想法,请随时 propose it on Github.

library(modelsummary)
library(estimatr)

set.seed(03222022)
N <- 10^4
x <- rnorm(N)
y <- 0.000002*x + rnorm(N)

clusters <- sample(1:1000, N, replace = TRUE)
z <- rnorm(N)
df <- cbind.data.frame(y, x, z, clusters)

models <- list(
    lm_robust(y ~ x, clusters = clusters, data = df),
    lm_robust(y ~ z, clusters = clusters, data = df))

f <- function(x) format(x$nclusters, big.mark = ",")
ar <- data.frame("Number of Clusters", lapply(models, f))
modelsummary(models,
             add_rows = ar,
             output = "markdown")
Model 1 Model 2
(Intercept) 0.001 0.000
(0.010) (0.010)
x -0.009
(0.010)
z 0.016
(0.010)
Num.Obs. 10000 10000
R2 0.000 0.000
R2 Adj. 0.000 0.000
Std.Errors by: clusters by: clusters
Number of Clusters 1,000 1,000

有两个统计数据:

f <- function(x) c(format(x$nclusters, big.mark = ","), "other stuff")
ar <- data.frame(c("Number of Clusters", "Junk"), lapply(models, f))
modelsummary(models,
             add_rows = ar,
             output = "markdown")
Model 1 Model 2
(Intercept) 0.001 0.000
(0.010) (0.010)
x -0.009
(0.010)
z 0.016
(0.010)
Num.Obs. 10000 10000
R2 0.000 0.000
R2 Adj. 0.000 0.000
Std.Errors by: clusters by: clusters
Number of Clusters 1,000 1,000
Junk other stuff other stuff