emmeans - 你能在对比中使用基线测量(即作为不同的变量)吗?
emmeans - can you use a baseline measure in a contrast (i.e. as a different variable)?
分析 RCT 数据时的最佳做法是针对基线测量值 (ancova) 进行调整。然而,研究人员通常仍会询问每组相对于基线的变化及其相对差异(由治疗 x 时间相互作用给出)。当针对基线测量进行调整时,涉及时间的对比只能从第一个 post 基线测量开始。
有没有一种方法可以在每个 post 治疗时间点从基线测量值(例如组合组的整体治疗前平均值)中减去模型预测来进行对比?我可以手动执行此操作,但不会得到 C.I.'s.
一些虚拟数据如下:
> m1 <- lme4::lmer(measure ~ baseline + treat*time + (1|id), data = dat)
> summary(m1)
Linear mixed model fit by REML ['lmerMod']
Formula: measure ~ baseline + treat * time + (1 | id)
Data: dat
REML criterion at convergence: 436.9
Scaled residuals:
Min 1Q Median 3Q Max
-2.33301 -0.35625 -0.00052 0.41946 1.55502
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 11.379 3.373
Residual 7.795 2.792
Number of obs: 80, groups: id, 40
Fixed effects:
Estimate Std. Error t value
(Intercept) -1.7660 4.1216 -0.428
baseline 0.6898 0.1257 5.486
treatB 1.7293 1.4064 1.230
time3 7.8969 0.8829 8.944
treatB:time3 -5.9876 1.2486 -4.795
Correlation of Fixed Effects:
(Intr) baseln treatB time3
baseline -0.971
treatB -0.335 0.175
time3 -0.107 0.000 0.314
treatB:tim3 0.076 0.000 -0.444 -0.707
在下面的每种情况下,我想做的是从预测平均值中减去基线。我猜参考网格不是这样工作的,但我还是想问一下。
> mb <- mean(dat$baseline[dat$time == 2])
> mod_new_rg <- ref_grid(m1, at = list(baseline = mb))
> emmeans(mod_new_rg, ~ baseline + time + treat)
baseline time treat emmean SE df lower.CL upper.CL
30.9 2 A 19.5 0.987 54.5 17.5 21.5
30.9 3 A 27.4 0.987 54.5 25.4 29.4
30.9 2 B 21.3 0.987 54.5 19.3 23.2
30.9 3 B 23.2 0.987 54.5 21.2 25.1
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
dat <- structure(list(id = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L,
5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 12L,
12L, 13L, 13L, 14L, 14L, 15L, 15L, 16L, 16L, 17L, 17L, 18L, 18L,
19L, 19L, 20L, 20L, 21L, 21L, 22L, 22L, 23L, 23L, 24L, 24L, 25L,
25L, 26L, 26L, 27L, 27L, 28L, 28L, 29L, 29L, 30L, 30L, 31L, 31L,
32L, 32L, 33L, 33L, 34L, 34L, 35L, 35L, 36L, 36L, 37L, 37L, 38L,
38L, 39L, 39L, 40L, 40L), .Label = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16",
"17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27",
"28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38",
"39", "40"), class = "factor"), treat = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), time = structure(c(2L, 3L, 2L, 3L, 2L,
3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L,
3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L,
3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L,
3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L,
3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L), .Label = c("1",
"2", "3"), class = "factor"), measure = c(17.3200614254647, 28.8156330714954,
21.160508157976, 30.1996968351034, 15.8306421666642, 24.6343639611382,
27.0257307671622, 35.280124692084, 21.5414605781997, 30.1710609087278,
24.7492184706737, 29.1159089608813, 13.7075342092653, 19.3839948700176,
30.9064084394, 43.0450835924821, 16.4621181393495, 27.9256720625141,
22.067795128987, 30.1292791903279, 22.2987617525344, 28.6339573643738,
10.0416232025695, 17.9790809598907, 22.3419910935755, 33.874536822989,
16.0349555541677, 24.2475148150631, 18.9908539202081, 25.6010342912277,
14.7967655238907, 19.390122202679, 22.5225831869562, 30.9891588158201,
18.8686550222759, 28.6210275251346, 21.4007984220698, 25.9102099243138,
25.9311028475919, 27.9896540407965, 25.6541596409646, 16.7652666970158,
19.2697851662734, 23.9828522768379, 16.7909337557515, 21.4816672428583,
22.19284590574, 28.8766659163021, 15.7183387191222, 26.3850724291503,
23.3685103961168, 28.7964869754369, 34.2230257941104, 40.6329654650291,
20.4357830783204, 18.3755342847755, 19.8702376058352, 20.3089453680716,
8.32292449233602, 16.4392611472045, 26.7694252442986, 21.7984979391771,
20.2887000047016, 20.091994056284, 23.0445006938799, 18.5579489512776,
14.5007251728231, 14.2647965430465, 18.9590380042456, 21.8548073291541,
16.8014625827141, 13.9396349007877, 23.0599924520126, 24.3582989818999,
17.3356080916247, 21.5054270440033, 20.8988975305153, 24.5258518904388,
24.069743228952, 26.8174125810013), baseline = c(30.5160390487568,
30.5160390487568, 32.6451428104435, 32.6451428104435, 28.0425948684312,
28.0425948684312, 37.5537250807658, 37.5537250807658, 32.000871324751,
32.000871324751, 34.9337043558422, 34.9337043558422, 32.069431652421,
32.069431652421, 41.3254959449604, 41.3254959449604, 22.680750564568,
22.680750564568, 38.8196824239253, 38.8196824239253, 30.9474784100669,
30.9474784100669, 28.3893633655831, 28.3893633655831, 30.5016640400354,
30.5016640400354, 28.2643729861136, 28.2643729861136, 30.6368731962024,
30.6368731962024, 28.5343771037685, 28.5343771037685, 31.8351023461854,
31.8351023461854, 31.3123200573486, 31.3123200573486, 31.0422497831132,
31.0422497831132, 34.8584162616575, 34.8584162616575, 35.7527843999141,
35.7527843999141, 23.7345582720131, 23.7345582720131, 26.800749534566,
26.800749534566, 24.5834704596406, 24.5834704596406, 30.7699429023659,
30.7699429023659, 28.9525228089337, 28.9525228089337, 48.7738392068561,
48.7738392068561, 32.4395246096931, 32.4395246096931, 27.0239536784171,
27.0239536784171, 21.4698209122962, 21.4698209122962, 34.4119498713112,
34.4119498713112, 29.5423378282026, 29.5423378282026, 35.9475237918695,
35.9475237918695, 28.7394363441077, 28.7394363441077, 28.9798112731033,
28.9798112731033, 29.0543497219897, 29.0543497219897, 28.3198356223554,
28.3198356223554, 27.0389712071391, 27.0389712071391, 26.97212084762,
26.97212084762, 28.4431727013629, 28.4431727013629)), class = "data.frame", row.names = c(NA,
-80L))
正如我在评论中所说,默认情况下基线已设置为其均值,因此您得到的调整后均值与 OP 中所示完全相同:
> (EMM = emmeans(m1, ~ time + treat))
time treat emmean SE df lower.CL upper.CL
2 A 19.5 0.987 54.5 17.5 21.5
3 A 27.4 0.987 54.5 25.4 29.4
2 B 21.3 0.987 54.5 19.3 23.2
3 B 23.2 0.987 54.5 21.2 25.1
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
如果要从 EMM
中的每个调整平均值中减去基线平均值,可以使用 contrast
中的 offset
参数来完成:
> mb <- mean(dat$baseline)
> CHG = contrast(EMM, "identity", offset = -mb, estName = "EMM - baseline")
> confint(CHG)
contrast EMM - baseline SE df lower.CL upper.CL
2 A -11.34 0.987 54.5 -13.32 -9.36
3 A -3.44 0.987 54.5 -5.42 -1.47
2 B -9.61 0.987 54.5 -11.59 -7.63
3 B -7.70 0.987 54.5 -9.68 -5.73
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
然而,这并没有考虑到估计基线均值的误差,所以SE们过于乐观了。显然,您可以指定其他偏移量,或 4 个不同偏移量的矢量,以满足您的目的。
另一件可能有用也可能没用的事情是从模型中删除 baseline
:
> emmeans(m1, ~ time + treat, submodel = ~ treat*time)
time treat emmean SE df lower.CL upper.CL
2 A 20.2 0.979 54.8 18.2 22.2
3 A 28.1 0.979 54.8 26.1 30.1
2 B 20.6 0.979 54.8 18.6 22.5
3 B 22.5 0.979 54.8 20.5 24.5
submodel: ~ treat + time + treat:time
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
...然后可能会像上面那样做一些偏移操作。这只是在没有基线的情况下拟合模型然后 运行 emmeans()
得到的近似值;如果您使用的是 lm()
而不是 lmer()
.
,那将是准确的
附录
CHG
中的一个问题是我们有一个 contrast
变量而不是 time
和 treat
。这应该可以通过 levels(CHG) = levels(EMM)
修复,但是,我刚刚了解到 levels<-
方法有一个错误并且不会复制偏移量。所以需要破解:
> g = CHG@grid
> levels(CHG) = levels(EMM)
> CHG@grid$.offset. = g$.offset.
> confint(CHG)
time treat EMM - baseline SE df lower.CL upper.CL
2 A -11.34 0.987 54.5 -13.32 -9.36
3 A -3.44 0.987 54.5 -5.42 -1.47
2 B -9.61 0.987 54.5 -11.59 -7.63
3 B -7.70 0.987 54.5 -9.68 -5.73
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
我已经为下一次更新修复了这个问题(代码用 .offset
而不是 .offset.
,省略了尾随点)。
分析 RCT 数据时的最佳做法是针对基线测量值 (ancova) 进行调整。然而,研究人员通常仍会询问每组相对于基线的变化及其相对差异(由治疗 x 时间相互作用给出)。当针对基线测量进行调整时,涉及时间的对比只能从第一个 post 基线测量开始。
有没有一种方法可以在每个 post 治疗时间点从基线测量值(例如组合组的整体治疗前平均值)中减去模型预测来进行对比?我可以手动执行此操作,但不会得到 C.I.'s.
一些虚拟数据如下:
> m1 <- lme4::lmer(measure ~ baseline + treat*time + (1|id), data = dat)
> summary(m1)
Linear mixed model fit by REML ['lmerMod']
Formula: measure ~ baseline + treat * time + (1 | id)
Data: dat
REML criterion at convergence: 436.9
Scaled residuals:
Min 1Q Median 3Q Max
-2.33301 -0.35625 -0.00052 0.41946 1.55502
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 11.379 3.373
Residual 7.795 2.792
Number of obs: 80, groups: id, 40
Fixed effects:
Estimate Std. Error t value
(Intercept) -1.7660 4.1216 -0.428
baseline 0.6898 0.1257 5.486
treatB 1.7293 1.4064 1.230
time3 7.8969 0.8829 8.944
treatB:time3 -5.9876 1.2486 -4.795
Correlation of Fixed Effects:
(Intr) baseln treatB time3
baseline -0.971
treatB -0.335 0.175
time3 -0.107 0.000 0.314
treatB:tim3 0.076 0.000 -0.444 -0.707
在下面的每种情况下,我想做的是从预测平均值中减去基线。我猜参考网格不是这样工作的,但我还是想问一下。
> mb <- mean(dat$baseline[dat$time == 2])
> mod_new_rg <- ref_grid(m1, at = list(baseline = mb))
> emmeans(mod_new_rg, ~ baseline + time + treat)
baseline time treat emmean SE df lower.CL upper.CL
30.9 2 A 19.5 0.987 54.5 17.5 21.5
30.9 3 A 27.4 0.987 54.5 25.4 29.4
30.9 2 B 21.3 0.987 54.5 19.3 23.2
30.9 3 B 23.2 0.987 54.5 21.2 25.1
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
dat <- structure(list(id = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L,
5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 12L,
12L, 13L, 13L, 14L, 14L, 15L, 15L, 16L, 16L, 17L, 17L, 18L, 18L,
19L, 19L, 20L, 20L, 21L, 21L, 22L, 22L, 23L, 23L, 24L, 24L, 25L,
25L, 26L, 26L, 27L, 27L, 28L, 28L, 29L, 29L, 30L, 30L, 31L, 31L,
32L, 32L, 33L, 33L, 34L, 34L, 35L, 35L, 36L, 36L, 37L, 37L, 38L,
38L, 39L, 39L, 40L, 40L), .Label = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16",
"17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27",
"28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38",
"39", "40"), class = "factor"), treat = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), time = structure(c(2L, 3L, 2L, 3L, 2L,
3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L,
3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L,
3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L,
3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L,
3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L), .Label = c("1",
"2", "3"), class = "factor"), measure = c(17.3200614254647, 28.8156330714954,
21.160508157976, 30.1996968351034, 15.8306421666642, 24.6343639611382,
27.0257307671622, 35.280124692084, 21.5414605781997, 30.1710609087278,
24.7492184706737, 29.1159089608813, 13.7075342092653, 19.3839948700176,
30.9064084394, 43.0450835924821, 16.4621181393495, 27.9256720625141,
22.067795128987, 30.1292791903279, 22.2987617525344, 28.6339573643738,
10.0416232025695, 17.9790809598907, 22.3419910935755, 33.874536822989,
16.0349555541677, 24.2475148150631, 18.9908539202081, 25.6010342912277,
14.7967655238907, 19.390122202679, 22.5225831869562, 30.9891588158201,
18.8686550222759, 28.6210275251346, 21.4007984220698, 25.9102099243138,
25.9311028475919, 27.9896540407965, 25.6541596409646, 16.7652666970158,
19.2697851662734, 23.9828522768379, 16.7909337557515, 21.4816672428583,
22.19284590574, 28.8766659163021, 15.7183387191222, 26.3850724291503,
23.3685103961168, 28.7964869754369, 34.2230257941104, 40.6329654650291,
20.4357830783204, 18.3755342847755, 19.8702376058352, 20.3089453680716,
8.32292449233602, 16.4392611472045, 26.7694252442986, 21.7984979391771,
20.2887000047016, 20.091994056284, 23.0445006938799, 18.5579489512776,
14.5007251728231, 14.2647965430465, 18.9590380042456, 21.8548073291541,
16.8014625827141, 13.9396349007877, 23.0599924520126, 24.3582989818999,
17.3356080916247, 21.5054270440033, 20.8988975305153, 24.5258518904388,
24.069743228952, 26.8174125810013), baseline = c(30.5160390487568,
30.5160390487568, 32.6451428104435, 32.6451428104435, 28.0425948684312,
28.0425948684312, 37.5537250807658, 37.5537250807658, 32.000871324751,
32.000871324751, 34.9337043558422, 34.9337043558422, 32.069431652421,
32.069431652421, 41.3254959449604, 41.3254959449604, 22.680750564568,
22.680750564568, 38.8196824239253, 38.8196824239253, 30.9474784100669,
30.9474784100669, 28.3893633655831, 28.3893633655831, 30.5016640400354,
30.5016640400354, 28.2643729861136, 28.2643729861136, 30.6368731962024,
30.6368731962024, 28.5343771037685, 28.5343771037685, 31.8351023461854,
31.8351023461854, 31.3123200573486, 31.3123200573486, 31.0422497831132,
31.0422497831132, 34.8584162616575, 34.8584162616575, 35.7527843999141,
35.7527843999141, 23.7345582720131, 23.7345582720131, 26.800749534566,
26.800749534566, 24.5834704596406, 24.5834704596406, 30.7699429023659,
30.7699429023659, 28.9525228089337, 28.9525228089337, 48.7738392068561,
48.7738392068561, 32.4395246096931, 32.4395246096931, 27.0239536784171,
27.0239536784171, 21.4698209122962, 21.4698209122962, 34.4119498713112,
34.4119498713112, 29.5423378282026, 29.5423378282026, 35.9475237918695,
35.9475237918695, 28.7394363441077, 28.7394363441077, 28.9798112731033,
28.9798112731033, 29.0543497219897, 29.0543497219897, 28.3198356223554,
28.3198356223554, 27.0389712071391, 27.0389712071391, 26.97212084762,
26.97212084762, 28.4431727013629, 28.4431727013629)), class = "data.frame", row.names = c(NA,
-80L))
正如我在评论中所说,默认情况下基线已设置为其均值,因此您得到的调整后均值与 OP 中所示完全相同:
> (EMM = emmeans(m1, ~ time + treat))
time treat emmean SE df lower.CL upper.CL
2 A 19.5 0.987 54.5 17.5 21.5
3 A 27.4 0.987 54.5 25.4 29.4
2 B 21.3 0.987 54.5 19.3 23.2
3 B 23.2 0.987 54.5 21.2 25.1
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
如果要从 EMM
中的每个调整平均值中减去基线平均值,可以使用 contrast
中的 offset
参数来完成:
> mb <- mean(dat$baseline)
> CHG = contrast(EMM, "identity", offset = -mb, estName = "EMM - baseline")
> confint(CHG)
contrast EMM - baseline SE df lower.CL upper.CL
2 A -11.34 0.987 54.5 -13.32 -9.36
3 A -3.44 0.987 54.5 -5.42 -1.47
2 B -9.61 0.987 54.5 -11.59 -7.63
3 B -7.70 0.987 54.5 -9.68 -5.73
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
然而,这并没有考虑到估计基线均值的误差,所以SE们过于乐观了。显然,您可以指定其他偏移量,或 4 个不同偏移量的矢量,以满足您的目的。
另一件可能有用也可能没用的事情是从模型中删除 baseline
:
> emmeans(m1, ~ time + treat, submodel = ~ treat*time)
time treat emmean SE df lower.CL upper.CL
2 A 20.2 0.979 54.8 18.2 22.2
3 A 28.1 0.979 54.8 26.1 30.1
2 B 20.6 0.979 54.8 18.6 22.5
3 B 22.5 0.979 54.8 20.5 24.5
submodel: ~ treat + time + treat:time
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
...然后可能会像上面那样做一些偏移操作。这只是在没有基线的情况下拟合模型然后 运行 emmeans()
得到的近似值;如果您使用的是 lm()
而不是 lmer()
.
附录
CHG
中的一个问题是我们有一个 contrast
变量而不是 time
和 treat
。这应该可以通过 levels(CHG) = levels(EMM)
修复,但是,我刚刚了解到 levels<-
方法有一个错误并且不会复制偏移量。所以需要破解:
> g = CHG@grid
> levels(CHG) = levels(EMM)
> CHG@grid$.offset. = g$.offset.
> confint(CHG)
time treat EMM - baseline SE df lower.CL upper.CL
2 A -11.34 0.987 54.5 -13.32 -9.36
3 A -3.44 0.987 54.5 -5.42 -1.47
2 B -9.61 0.987 54.5 -11.59 -7.63
3 B -7.70 0.987 54.5 -9.68 -5.73
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
我已经为下一次更新修复了这个问题(代码用 .offset
而不是 .offset.
,省略了尾随点)。