mgcv bam() error: cannot allocate vector of size 99.6 Gb

Question

我正在尝试使用 bam（mgcv 库）拟合加法混合模型。我的数据集有 10^6 个观察结果，这些观察结果来自对 300 个健康中心内嵌套的 2.10^5 个儿童的生长纵向研究。我正在寻找每个中心的坡度。型号是

bam(haz ~ s(month, bs = "cc", k = 12)+ sex+ s(age)+ center+ year+ year*center+s(child, bs="re"), data)

每当我尝试拟合模型时，都会出现以下错误消息：

Error: cannot allocate vector of size 99.6 Gb
In addition: Warning message:
In matrix(by, n, q) : data length exceeds size of matrix

我正在使用 500 Gb de RAM 的集群。

感谢您的帮助

Answer 1

要更准确地诊断问题出在哪里，请尝试在拟合模型时省略各种项。模型中有几个术语可能会让你大吃一惊：

涉及 center 的固定效应会爆炸 300 列 * 10^6 行；根据 year 是数字还是因子，year*center 项可能会增加到 600 列或 (nyears*300) 列
我不清楚 bam 是否对 s(.,bs="re") 项使用稀疏矩阵；否则，您将遇到大麻烦（2*10^5 列 * 10^6 行）

数量级，10^6 个数值的向量（模型矩阵的一列）占用 7.6 Mb，因此 500 GB / 7.6 MB 大约是 65,000 列...

这里只是猜测，但我会尝试 gamm4 包。它不是专门为低内存使用而设计的，但是：

‘gamm4’ is most useful when the random effects are not i.i.d., or when there are large numbers of random coeffecients [sic] (more than several hundred), each applying to only a small proportion of the response data.

我也会将大部分术语变成随机效应：

gamm4::gamm4(haz ~ s(month, bs = "cc", k = 12)+ sex+ s(age)+ 
 (1|center)+ (1|year)+ (1|year:center)+(1|child), data)

或者，如果数据集中的年份不是很多，则将年份视为固定效应：

gamm4::gamm4(haz ~ s(month, bs = "cc", k = 12)+ sex+ s(age)+ 
 year + (1|center)+ (1|year:center)+(1|child), data)

如果年份较少，那么 (year|center) 可能有意义，以评估年份之间的中心变异和协变……如果年份很多，请考虑改为使用平滑项。 ..