如何计算广义线性混合效应模型的中值绝对偏差 (MAD)

How can I compute the median absolute deviation (MAD) for generalized linear mixed-effects models

我知道我的问题与统计数据有关,但我正在 R 中寻找解决方案,因此我相信它适合 SO。

我使用 Rlme4 包中的 glmer 函数构建了一个广义线性混合效应模型 (GLMM),以根据重要的解释性因素对水产养殖场周围的物种丰富度进行建模使用 Zuur 等人的变量。 (2009) Mixed Effects Models and Extensions in Ecology with R。型号是:

Mod1 <- glmer(Richness ~ Distance + Depth + Substrate + Beggiatoa + 
        Distance*Beggiatoa + (1|Site/transect), family = poisson, data = mydata)

现在我有一个在不同站点收集的完整数据集,我想评估这个模型在新数据集上的表现如何。

在 CV 上 question 之后,有人建议在新数据集上寻找中值绝对偏差 (mad)。我尝试了 Rstats 包中的 mad 函数,但我收到以下错误消息:

Error in x[!is.na(x)] : object of type 'S4' is not subsettable
In addition: Warning messages:
1: In is.na(x) : is.na() applied to non-(list or vector) of type 'S4'
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'S4'

有人知道这里出了什么问题吗?是不是stats中的mad不能计算GLMM?如果是这样,是否有另一个 R 包来计算 GLMM 的 mad?

编辑:

为了让您了解我的数据,这是 dput(head(mydata)) 的输出,还要注意新数据集中没有 "Substrate" 类别,"S" 指的是 "Richness":

structure(list(S = c(0, 1, 2, 3, 3, 2), Site = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("BC", "BH", "GC", "IS", "Ref"
), class = "factor"), Transect = structure(c(4L, 4L, 4L, 4L, 
4L, 4L), .Label = c("10GC", "10IS", "10N", "10S", "11IS", "12IS", 
"13E", "1GC", "1N", "1W", "2E", "2GC", "2IS", "2N", "2W", "2WA", 
"3E", "3GC", "3IS", "3N", "3S", "4E", "4GC", "4IS", "4S", "4W", 
"5GC", "5IS", "5S", "6GC", "6IS", "6N", "6S", "6W", "7E", "7GC", 
"7IS", "8GC", "8IS", "8W", "9E", "9GC", "9IS", "9N", "RefBC1", 
"RefBC10", "RefBC11", "RefBC12", "RefBC2", "RefBC3", "RefBC4", 
"RefBC5", "RefBC6", "RefBC7", "RefBC8", "RefBC9", "X1", "X2"), class = "factor"), 
Distance = c(2, 20, 40, 80, 120, 160), Depth = c(40L, 40L, 
50L, 40L, 40L, 40L), Beggiatoa = c(2, 1, 1, 0, 0, 0)), .Names = c("S", 
"Site", "Transect", "Distance", "Depth", "Beggiatoa"), row.names = c(NA, 
6L), class = "data.frame")

对于样本内误差,中位数绝对偏差计算就是

mad(residuals(fitted_model))

...您可能需要 residuals(fitted_model,type="response"),因为 residuals 会默认为您提供偏差残差(参见 ?residuals.merMod

如果你想查看样本外错误,你可以这样做:

pred <- predict(fitted_model,
                newdata = newdf,
                type = "response",
                re.form=~0)
mad(pred, center=newdf$S)

re.form=~0 指定您希望从预测中忽略随机效应,这是您唯一的选择,除非您在 sites/transects 处进行预测,那里您也有训练数据)