如何从 R 中具有分类处理的回归模型中获取均值比率的标准误差?

How to get standard errors for ratios of means from a regression model with categorical treatments in R?

我有一些来自间作实验的示例数据,其中包括两种作物 A 和 B 的单作产量和相互间作的产量。在这种情况下,单一栽培的作物表示为与自身的间作作物,即 作物 A 与作物 A = 单一栽培;作物 B 与作物 B = 单一栽培;作物 A 与作物 B = 间作;作物 B 与作物 B = 间作。

我想获得土地当量比的平均值和标准误差,定义为:

LER =(作物A与B间作的产量)/(单作作物A的产量)+(作物A的产量 与B间作)/(单作作物A的产量)

有人建议我使用 SE 的 delta 方法,所以我想我想使用 msm 中的 deltamethod() 函数(但不太确定)

这是数据(更多物种的更大实验的子集):

structure(list(site = c("Two", "Two", "Two", "Two", "Two", "Two", 
"Two", "Two", "Two", "Two", "Two", "Two", "One", "One", "One", 
"One", "One", "One", "One", "One", "One", "One", "One", "One", 
"Three", "Three", "Three", "Three", "Three", "Three", "Three", 
"Three", "Three", "Three", "Three", "Three"), siteblock = c("Two_2", 
"Two_1", "Two_3", "Two_1", "Two_3", "Two_2", "Two_1", "Two_3", 
"Two_2", "Two_2", "Two_3", "Two_1", "One_2", "One_1", "One_3", 
"One_3", "One_1", "One_2", "One_3", "One_1", "One_2", "One_2", 
"One_3", "One_1", "Three_1", "Three_2", "Three_3", "Three_1", 
"Three_2", "Three_3", "Three_1", "Three_2", "Three_3", "Three_1", 
"Three_2", "Three_3"), species = c("A", "A", "A", "A", "A", "A", 
"B", "B", "B", "B", "B", "B", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B", "A", "A", "A", "A", "A", "A", "B", "B", 
"B", "B", "B", "B"), pair = c("A", "A", "A", "B", "B", "B", "A", 
"A", "A", "B", "B", "B", "A", "A", "A", "B", "B", "B", "A", "A", 
"A", "B", "B", "B", "A", "A", "A", "B", "B", "B", "A", "A", "A", 
"B", "B", "B"), yield.pp = c(118.6648257, 80.29858998, 90.53153963, 
48.64188012, 140.3934164, 103.0518444, 18.23102298, 7.104490919, 
22.81524354, 21.07202039, 7.529486987, 18.63650567, 242.2567602, 
202.1431331, 185.5609192, 283.144789, 241.1690115, 241.5258056, 
23.78876862, 35.87524173, 41.1028137, 18.55380809, 22.46060419, 
18.46323056, 242.9551749, 231.387521, 455.9878777, 288.2237713, 
156.3390735, 207.4286019, 7.167311238, 34.66607289, 22.41394604, 
42.22510313, 38.70415176, 57.86653817)), class = "data.frame", row.names = c(NA, 
-36L))

该实验在三个地点的三个区块中重复进行,但为了简单起见,我们暂时忽略它。我创建了一个模型来估计每个组合中每个物种的产量:

mod<-lm(yield.pp~species*pair,data=df)
summary(mod)

然后就可以直接得到每种组合中每种作物的估计均值:

est_grid<-data.frame(expand.grid(species=levels(as.factor(df$species)),pair=levels(as.factor(df$pair))))
est_means<-predict(mod,newdata=est_grid)
est_means
##        1         2         3         4 
##205.53182  23.68499 189.99091  27.27905 

并计算平均土地当量比:

(est_means[3]/est_means[1])+(est_means[2]/est_means[4])
##       3 
##1.792635

所以我的问题是如何获得符合平均比率的SE。我目前被 deltamethod 难住了,因为它似乎只为估计模型参数的比率提供 SE,而不是估计均值的比率 ...

使用来自 emmeans 的 ref_grid 创建参考网格,在这种情况下,它将 return 一个 emmGrid 对象并且具有 vcov 方法,这是 运行 来自车包.

library(car)
library(emmeans)

r <- ref_grid(mod); r
## 'emmGrid' object with variables:
##     species = A, B
##     pair = A, B

s <- summary(r); s 
##  species pair prediction   SE df
##  A       A         205.5 23.7 32
##  B       A          23.7 23.7 32
##  A       B         190.0 23.7 32
##  B       B          27.3 23.7 32

m <- with(s, setNames(prediction, paste0(species, pair)))
##        AA        BA        AB        BB 
## 205.53182  23.68499 189.99091  27.27905 

deltaMethod(m, "AB / AA + BA / BB", vcov(r))
##               Estimate       SE    2.5 % 97.5 %
## AB/AA + BA/BB  1.79264  1.15910 -0.47916 4.0644

如果您只需要比率,emmeans 包可以直接提供。关键是将预测重新网格化为对数尺度:

> rlog <- ref_grid(mod, transform = "log")

或者只使用其他答案中的参考网格:rlog <- regrid(r, "log")。 无论哪种方式,我们都已转换为参考网格,就好像应用了对数变换一样。使用type = "response"撤销对数转换:

> confint(rlog, type = "response")
 species pair response   SE df lower.CL upper.CL
 A       A       205.5 23.7 32   162.58      260
 B       A        23.7 23.7 32     3.10      181
 A       B       190.0 23.7 32   147.43      245
 B       B        27.3 23.7 32     4.66      160

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

这些正是@G.Grothendieck的回答得到的结果。现在做两两比较;由于日志的差异是比率的日志,反向转换过程产生比率:

> pairs(rlog, type = "response", infer = c(TRUE, TRUE), adjust = "none")
 contrast  ratio    SE df lower.CL upper.CL null t.ratio p.value
 A A / B A 8.678 8.725 32   1.1194   67.268    1   2.149  0.0393
 A A / A B 1.082 0.183 32   0.7659    1.528    1   0.464  0.6460
 A A / B B 7.534 6.591 32   1.2682   44.763    1   2.309  0.0276
 B A / A B 0.125 0.125 32   0.0160    0.969    1  -2.069  0.0467
 B A / B B 0.868 1.148 32   0.0587   12.846    1  -0.107  0.9156
 A B / B B 6.965 6.102 32   1.1692   41.487    1   2.215  0.0340

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 
Tests are performed on the log scale

(我添加了 infer 参数以获取 CI 和测试。)

经过一些花哨的操作,甚至可以回答原始问题:

> con = contrast(rlog, list(`AB/AA` = c(-1,0,1,0), `BA/BB` = c(0,1,0,-1)))
> confint(con, type = "r")
 contrast ratio    SE df lower.CL upper.CL
 AB/AA    0.924 0.157 32   0.6544     1.31
 BA/BB    0.868 1.148 32   0.0587    12.85

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

> confint(contrast(regrid(con), list(`AB/AA + BA/BB` = c(1,1))))
 contrast      estimate   SE df lower.CL upper.CL
 AB/AA + BA/BB     1.79 1.16 32   -0.568     4.15

Confidence level used: 0.95

置信限略有不同,因为我们使用 32 d.f。而 deltaMethod 产生渐近结果(无限 d.f。)