R中MANCOVA的估计边际均值
estimated marginal means of a MANCOVA in R
我构建了一个考虑协变量的模型。有两个因变量("A"、"B")和两个自变量("C"、"D")和一个连续协变量("E")。我运行一个MANCOVA如下:
x<-cbind(A,B) #combining dependent variables
y<-cbind(C,D) #combining independent variables
fit<-manova(x~y+E)
summary(fit, test="Pillai")
这一切都很完美,我发现协变量对因变量有影响。因此,我想使用 emmeans 包来解释具有估计边际均值的协方差。但是,当我尝试 运行 以下代码时,我收到此错误:
library(emmeans)
emmeans(fit,~y+E)
>Error in eval(expr, envir, enclos) : object 'spc.l$Ghopper.Start..g.' not
found
>Error in ref_grid(object, ...) : Perhaps a 'data' or 'params' argument is
needed
这是我的数据:
structure(list(ï..insect = c(105L, 106L, 107L, 108L, 110L, 112L,
113L, 114L, 115L, 116L, 117L, 118L, 119L, 120L, 121L, 122L, 123L,
125L, 126L, 127L, 128L), C = structure(c(1L, 2L, 1L, 2L, 2L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L
), .Label = c("Pair A 7p:35c-35p:7c", "Pair B 7p:35c-28p:14c"
), class = "factor"), D = structure(c(1L, 1L, 2L, 2L, 1L, 2L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("F",
"M"), class = "factor"), E = c(0.357, 0.259, 0.128, 0.104, 0.248,
0.111, 0.218, 0.213, 0.13, 0.123, 0.335, 0.22, 0.247, 0.295,
0.297, 0.219, 0.132, 0.194, 0.207, 0.266, 0.234), A = c(0.025333333,
0.041666665, 0.043833332, 0.046333331, 0.108499995, 0.051999997,
0.101833329, 0.06083333, 0.059499998, 0.056166664, 0.017833333,
0.053666664, 0.066333331, 0.025499998, 0.073666664, 0.149333324,
0.044666665, 0.047499998, 0.051833331, 0.020499999, 0.062499997
), B = c(0.050666667, 0.020333321, 0.023166668, 0.029666645,
0.032499992, 0.028999981, 0.029166671, 0.024166656, 0.025500002,
0.020833325, 0.021166667, 0.038333304, 0.023666669, 0.022499981,
0.040333336, 0.121666569, 0.023333335, 0.017500002, 0.01816666,
0.018500001, 0.024499989)), .Names = c("ï..insect", "C", "D",
"E", "A", "B"), class = "data.frame", row.names = c(NA, -21L))
我确信这个问题有一个简单的修复方法,但我有点迷茫,stackexchange 上发布的关于 emmeans 的问题很少!
这是计算上可行的东西:
R> fit = manova(x ~ cbind(C, D) + E, data = dat)
R> ref_grid(fit)
'emmGrid' object with variables:
C = 1.5238
D = 1.2857
E = 0.21605
rep.meas = multivariate response levels: A, B
R> emmeans(fit, ~ C + D + E)
C D E emmean SE df lower.CL upper.CL
1.52381 1.285714 0.2160476 0.04438095 0.005377854 17 0.03303467 0.05572723
Results are averaged over the levels of: rep.meas
Confidence level used: 0.95
我稍后会详细介绍这些结果。因此,在模型调用中用 cbind(C, D)
替换 y
将使其工作(计算)。直接使用y
,我得到一个错误信息:
R> fit0 = manova(x ~ y+E, data = dat)
R> ref_grid(fit0)
Error in model.matrix(trms, m, contrasts.arg = object$contrasts)[, nm, :
subscript out of bounds
这与 OP 中显示的错误消息不同,我只能猜测范围界定有所不同。但这里发生的是 C
和 D
实际上是因子,而不是数字预测变量。 cbind(C,D)
将它们转换为具有两列的数字矩阵。我需要调查和更正该错误的一些技术原因。
但是 重要的 事情是 fit
和 fit0
都不是您想要用于 post-hoc 比较的模型,因为毕竟, C
和 D
是因子。 fit
的参考网格和 EMM 基于 C
和 D
的数字重新编码的 平均值 值,以及平均 E
。这就是为什么只有一行 emmeans
输出。
我认为需要的是将 cbind
从通话中取出:
R> fit1 = manova(x ~ C + D + E, data = dat)
R> summary(fit1)
Df Pillai approx F num Df den Df Pr(>F)
C 1 0.07083 0.6098 2 16 0.55559
D 1 0.03150 0.2602 2 16 0.77408
E 1 0.37794 4.8606 2 16 0.02242
Residuals 17
R> ref_grid(fit1)
'emmGrid' object with variables:
C = Pair A 7p:35c-35p:7c, Pair B 7p:35c-28p:14c
D = F, M
E = 0.21605
rep.meas = multivariate response levels: A, B
由于 E
是数字并且仅简化为它的平均值,因此您无需将其包含在 emmeans
调用中:
R> emmeans(fit1, ~ C + D)
C D emmean SE df lower.CL upper.CL
Pair A 7p:35c-35p:7c F 0.05021337 0.011726789 17 0.02547201 0.07495473
Pair B 7p:35c-28p:14c F 0.05576090 0.008808415 17 0.03717677 0.07434503
Pair A 7p:35c-35p:7c M 0.01962942 0.016299910 17 -0.01476039 0.05401922
Pair B 7p:35c-28p:14c M 0.02517695 0.019816637 17 -0.01663250 0.06698640
Results are averaged over the levels of: rep.meas
Confidence level used: 0.95
这些结果是因变量中两次重复测量的平均值。您可能想将它们分开,或者对其他一些因素的水平进行平均。由于 C
和 D
都不重要,我将只获取 rep.meas
:
的 EMM 及其比较
R> emmeans(fit1, pairwise ~ rep.meas)
$emmeans
rep.meas emmean SE df lower.CL upper.CL
A 0.04551606 0.008547982 17 0.02748139 0.06355073
B 0.02987426 0.006885079 17 0.01534801 0.04440050
Results are averaged over the levels of: C, D
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
A - B 0.01564181 0.005645167 17 2.771 0.0131
Results are averaged over the levels of: C, D
由于 E
显着并且是一个协变量,因此查看 rep.meas
的每个水平是否有不同的斜率可能会很有趣:
R> emtrends(fit1, pairwise ~ rep.meas, var = "E")
$emtrends
rep.meas E.trend SE df lower.CL upper.CL
A -0.3472133 0.1737591 17 -0.7138129 0.01938632
B 0.0213882 0.1399564 17 -0.2738940 0.31667042
Results are averaged over the levels of: C, D
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
A - B -0.3686015 0.1147521 17 -3.212 0.0051
Results are averaged over the levels of: C, D
另外,试试这个以获得一个很好的预测图(结果未显示):
emmip(fit1, rep.meas ~ E|C*D, at=list(E = c(.1,.35)))
我构建了一个考虑协变量的模型。有两个因变量("A"、"B")和两个自变量("C"、"D")和一个连续协变量("E")。我运行一个MANCOVA如下:
x<-cbind(A,B) #combining dependent variables
y<-cbind(C,D) #combining independent variables
fit<-manova(x~y+E)
summary(fit, test="Pillai")
这一切都很完美,我发现协变量对因变量有影响。因此,我想使用 emmeans 包来解释具有估计边际均值的协方差。但是,当我尝试 运行 以下代码时,我收到此错误:
library(emmeans)
emmeans(fit,~y+E)
>Error in eval(expr, envir, enclos) : object 'spc.l$Ghopper.Start..g.' not
found
>Error in ref_grid(object, ...) : Perhaps a 'data' or 'params' argument is
needed
这是我的数据:
structure(list(ï..insect = c(105L, 106L, 107L, 108L, 110L, 112L,
113L, 114L, 115L, 116L, 117L, 118L, 119L, 120L, 121L, 122L, 123L,
125L, 126L, 127L, 128L), C = structure(c(1L, 2L, 1L, 2L, 2L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L
), .Label = c("Pair A 7p:35c-35p:7c", "Pair B 7p:35c-28p:14c"
), class = "factor"), D = structure(c(1L, 1L, 2L, 2L, 1L, 2L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("F",
"M"), class = "factor"), E = c(0.357, 0.259, 0.128, 0.104, 0.248,
0.111, 0.218, 0.213, 0.13, 0.123, 0.335, 0.22, 0.247, 0.295,
0.297, 0.219, 0.132, 0.194, 0.207, 0.266, 0.234), A = c(0.025333333,
0.041666665, 0.043833332, 0.046333331, 0.108499995, 0.051999997,
0.101833329, 0.06083333, 0.059499998, 0.056166664, 0.017833333,
0.053666664, 0.066333331, 0.025499998, 0.073666664, 0.149333324,
0.044666665, 0.047499998, 0.051833331, 0.020499999, 0.062499997
), B = c(0.050666667, 0.020333321, 0.023166668, 0.029666645,
0.032499992, 0.028999981, 0.029166671, 0.024166656, 0.025500002,
0.020833325, 0.021166667, 0.038333304, 0.023666669, 0.022499981,
0.040333336, 0.121666569, 0.023333335, 0.017500002, 0.01816666,
0.018500001, 0.024499989)), .Names = c("ï..insect", "C", "D",
"E", "A", "B"), class = "data.frame", row.names = c(NA, -21L))
我确信这个问题有一个简单的修复方法,但我有点迷茫,stackexchange 上发布的关于 emmeans 的问题很少!
这是计算上可行的东西:
R> fit = manova(x ~ cbind(C, D) + E, data = dat)
R> ref_grid(fit)
'emmGrid' object with variables:
C = 1.5238
D = 1.2857
E = 0.21605
rep.meas = multivariate response levels: A, B
R> emmeans(fit, ~ C + D + E)
C D E emmean SE df lower.CL upper.CL
1.52381 1.285714 0.2160476 0.04438095 0.005377854 17 0.03303467 0.05572723
Results are averaged over the levels of: rep.meas
Confidence level used: 0.95
我稍后会详细介绍这些结果。因此,在模型调用中用 cbind(C, D)
替换 y
将使其工作(计算)。直接使用y
,我得到一个错误信息:
R> fit0 = manova(x ~ y+E, data = dat)
R> ref_grid(fit0)
Error in model.matrix(trms, m, contrasts.arg = object$contrasts)[, nm, :
subscript out of bounds
这与 OP 中显示的错误消息不同,我只能猜测范围界定有所不同。但这里发生的是 C
和 D
实际上是因子,而不是数字预测变量。 cbind(C,D)
将它们转换为具有两列的数字矩阵。我需要调查和更正该错误的一些技术原因。
但是 重要的 事情是 fit
和 fit0
都不是您想要用于 post-hoc 比较的模型,因为毕竟, C
和 D
是因子。 fit
的参考网格和 EMM 基于 C
和 D
的数字重新编码的 平均值 值,以及平均 E
。这就是为什么只有一行 emmeans
输出。
我认为需要的是将 cbind
从通话中取出:
R> fit1 = manova(x ~ C + D + E, data = dat)
R> summary(fit1)
Df Pillai approx F num Df den Df Pr(>F)
C 1 0.07083 0.6098 2 16 0.55559
D 1 0.03150 0.2602 2 16 0.77408
E 1 0.37794 4.8606 2 16 0.02242
Residuals 17
R> ref_grid(fit1)
'emmGrid' object with variables:
C = Pair A 7p:35c-35p:7c, Pair B 7p:35c-28p:14c
D = F, M
E = 0.21605
rep.meas = multivariate response levels: A, B
由于 E
是数字并且仅简化为它的平均值,因此您无需将其包含在 emmeans
调用中:
R> emmeans(fit1, ~ C + D)
C D emmean SE df lower.CL upper.CL
Pair A 7p:35c-35p:7c F 0.05021337 0.011726789 17 0.02547201 0.07495473
Pair B 7p:35c-28p:14c F 0.05576090 0.008808415 17 0.03717677 0.07434503
Pair A 7p:35c-35p:7c M 0.01962942 0.016299910 17 -0.01476039 0.05401922
Pair B 7p:35c-28p:14c M 0.02517695 0.019816637 17 -0.01663250 0.06698640
Results are averaged over the levels of: rep.meas
Confidence level used: 0.95
这些结果是因变量中两次重复测量的平均值。您可能想将它们分开,或者对其他一些因素的水平进行平均。由于 C
和 D
都不重要,我将只获取 rep.meas
:
R> emmeans(fit1, pairwise ~ rep.meas)
$emmeans
rep.meas emmean SE df lower.CL upper.CL
A 0.04551606 0.008547982 17 0.02748139 0.06355073
B 0.02987426 0.006885079 17 0.01534801 0.04440050
Results are averaged over the levels of: C, D
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
A - B 0.01564181 0.005645167 17 2.771 0.0131
Results are averaged over the levels of: C, D
由于 E
显着并且是一个协变量,因此查看 rep.meas
的每个水平是否有不同的斜率可能会很有趣:
R> emtrends(fit1, pairwise ~ rep.meas, var = "E")
$emtrends
rep.meas E.trend SE df lower.CL upper.CL
A -0.3472133 0.1737591 17 -0.7138129 0.01938632
B 0.0213882 0.1399564 17 -0.2738940 0.31667042
Results are averaged over the levels of: C, D
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
A - B -0.3686015 0.1147521 17 -3.212 0.0051
Results are averaged over the levels of: C, D
另外,试试这个以获得一个很好的预测图(结果未显示):
emmip(fit1, rep.meas ~ E|C*D, at=list(E = c(.1,.35)))