对来自单独线性模型的系数执行假设检验

Question

假设我有两个单独的 lm 对象

data(mtcars)

lm1 <- lm(mpg ~ wt, data = mtcars)
lm2 <- lm(mpg ~ wt + disp, data = mtcars)

在这种情况下，我想比较两个 wt 系数，并对两个模型中的系数相等的空值执行假设检验（出于技术原因，我实际上需要两个模型，而不仅仅是互动）

Answer 1

既然您想对估计值进行假设检验，我建议您使用完全贝叶斯模型，它将为您提供每个变量的完整后验分布。

rstanarm 基于 Stan，并提供模仿通常的 lm、glm 语法的便捷功能；如果您想了解更多关于 Stan/RStan 的信息，请参阅 here.

基于每个变量的后验分布，我们可以执行例如t 测试和 Kolmogorov-Smirnov 测试以比较每个变量的完整后验密度。

您可以执行以下操作：

执行模型拟合。

library(rstanarm);
lm1 <- stan_lm(mpg ~ wt, data = mtcars, prior = NULL);
lm2 <- stan_lm(mpg ~ wt + disp, data = mtcars, prior = NULL);

请注意运行具有 rstanarm 的完全贝叶斯线性模型是多么容易。

提取所有共享系数的后验密度（在本例中，(Intercept)和wt）。

library(tidyverse);
shared.coef <- intersect(names(coef(lm1)), names(coef(lm2)));
shared.coef;
#[1] "(Intercept)" "wt"
df1 <- lm1 %>%
    as.data.frame() %>%
    select(one_of(shared.coef)) %>%
    mutate(model = "lm1");
df2 <- lm2 %>%
    as.data.frame() %>%
    select(one_of(shared.coef)) %>%
    mutate(model = "lm2");

4000 次 MCMC 绘图的后验密度存储在两个 data.frame 中。

我们绘制后验密度。

# Plot posterior densities for all common parameters
bind_rows(df1, df2) %>%
    gather(var, value, 1:length(shared.coef)) %>%
    ggplot(aes(value, colour = model)) +
        geom_density() +
        facet_wrap(~ var, scale = "free");

我们比较了 t 测试和 KS 测试中每个共享参数的后验密度分布。这里我使用库 broom 来 tidy-up 输出。

# Perform t test and KS test
library(broom);
res <- lapply(1:length(shared.coef), function(i)
    list(t.test(df1[, i], df2[, i]), ks.test(df1[, i], df2[, i])));
names(res) <- shared.coef;
lapply(res, function(x) bind_rows(sapply(x, tidy)));
#$`(Intercept)`
#   estimate estimate1 estimate2 statistic p.value parameter  conf.low conf.high
#1 -4.497093  30.07725  34.57434 -104.8882       0  7155.965 -4.581141 -4.413045
#2        NA        NA        NA    0.7725       0        NA        NA        NA
#                              method alternative
#1            Welch Two Sample t-test   two.sided
#2 Two-sample Kolmogorov-Smirnov test   two-sided
#
#$wt
#   estimate estimate1 estimate2 statistic      p.value parameter  conf.low
#1 0.1825202 -3.097777 -3.280297  9.120137 1.074479e-19  4876.248 0.1432859
#2        NA        NA        NA  0.290750 0.000000e+00        NA        NA
#  conf.high                             method alternative
#1 0.2217544            Welch Two Sample t-test   two.sided
#2        NA Two-sample Kolmogorov-Smirnov test   two-sided
#
#There were 12 warnings (use warnings() to see them)

（绑定行时因因子水平不等而产生的警告，可忽略。）

对来自单独线性模型的系数执行假设检验

Perform hypothesis testing on coefficients from separate linear models

r

lsmeans

emmeans