R 中非线性混合效应(使用 nlme 包)的不稳定性

Instability of nonlinear mixed effects (using nlme package) in R

我正在尝试为 COVID-19 数据构建一个 nonlinear mixed effects model 数据,该数据符合来自不同国家的每日病例数的钟形曲线(随机效应在国家层面)。

数据 table 太大,无法包含在 post 中,但这里是 的结构:

> str(dat)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   2642 obs. of  7 variables:
 $ Country.Region: Factor w/ 181 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ date          : Date, format: "2020-03-24" "2020-03-25" "2020-03-26" "2020-03-27" ...
 $ day           : num  0 1 2 3 4 5 6 7 8 9 ...
 $ total_cases   : int  74 84 94 110 110 120 170 174 237 273 ...
 $ new_cases     : int  34 10 10 16 0 10 50 4 63 36 ...
 $ total_deaths  : int  1 2 4 4 4 4 4 4 4 6 ...
 $ new_deaths    : int  0 1 2 0 0 0 0 0 0 2 ...

尝试拟合模型:

library(nlme)
dat <- readRDS("dat.rds")

# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
  f <- d*exp(-((x - mu)^2) / (2*var))
  return(f)
}

# NLME Model
baseModel <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
                  data = dat, 
                  fixed = list(d ~ 1, mu ~ 1, var ~ 1),
                  random = d + mu + var ~ 1|Country.Region,
                  start = list(fixed = c(1000, 20, 20)),
                  na.action = na.omit)

然而,这是由此产生的错误:

Error in nlme.formula(new_cases ~ bellcurve.model(d, mu, var, x = day), : Singularity in backsolve at level 0, block 1

我读过其他 Whosebug posts 表明某些协变量可能在模型中混淆,但我在这里用来预测 new_cases 的唯一协变量是 day。任何关于这意味着什么的建议或调试技巧将不胜感激,尤其是任何关于如何解决这个问题的建议。

tl;dr:这种类型的建模对起始值很敏感。我详细说明了如何成功解决该问题。有助于此修复的三个主要内容是:

  • 增加最大迭代和函数调用
  • 获取知情起始值
  • 可能 restrict/subset 最初添加的国家/地区的数据具有较少的可用信息

简要概述:

我只是通过盲目地更改起始值来开始修补。将 c(1000, 20, 20) 更改为 c(20000, 10, 10) 我将模型变为 运行 error/warning 免费 baseModel.naiveLog-likelihood: -23451.37。使用增加的最大值进行迭代和函数调用并获取知情的起始值使模型能够显着改进 baseModel.bellcurve 报告 Log-likelihood: -20487.86


天真地改变起始值将消除错误

我调整了起始值。此外,我还展示了如何增加函数调用和迭代的最大数量以及如何使用详细选项。使用 ?nlme?nlmeControl 可以找到更多信息。根据我的经验,这些类型的模型可能对起始值、最大迭代次数和函数调用很敏感。

baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
                  data = dat, 
                  fixed = list(d ~ 1, mu ~ 1, var ~ 1),
                  random = d + mu + var ~ 1|Country.Region,
                  start = list(fixed = c(20000, 10, 10)),
                  na.action = na.omit,
                  control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
baseModel.naive

输出:

> baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
+                   data = dat, 
+                   fixed = list(d ~ 1, mu ~ 1, var ~ 1),
+                   random = d + mu + var ~ 1|Country.Region,
+                   start = list(fixed = c(20000, 10, 10)),
+                   na.action = na.omit,
+                   control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
  0:     30455.774:  2.23045  10.6880  8.80935 -11177.6  3277.46 -1877.96
  1:     30450.763:  2.80826  11.2726  9.37889 -11177.6  3277.46 -1877.96
  2:     30449.789:  2.94771  11.5283  9.80691 -11177.6  3277.46 -1877.96
  3:     30449.150:  3.30454  11.8277  10.0329 -11177.6  3277.46 -1877.96
  4:     30448.727:  3.64209  12.2237  10.4571 -11177.6  3277.46 -1877.96
  5:     30448.540:  3.97875  12.5637  10.8217 -11177.6  3277.46 -1877.96
  6:     30448.445:  4.31754  12.9055  11.1826 -11177.6  3277.46 -1877.96
  7:     30448.397:  4.65727  13.2480  11.5420 -11177.6  3277.46 -1877.96
  8:     30448.372:  4.99746  13.5908  11.9008 -11177.6  3277.46 -1877.96
  9:     30448.360:  5.33789  13.9337  12.2591 -11177.6  3277.46 -1877.96
 10:     30448.354:  5.67844  14.2767  12.6173 -11177.6  3277.46 -1877.96
 11:     30448.351:  6.01905  14.6197  12.9753 -11177.6  3277.46 -1877.96
 12:     30448.349:  6.35969  14.9627  13.3333 -11177.6  3277.46 -1877.96
 13:     30448.349:  6.70035  15.3058  13.6913 -11177.6  3277.46 -1877.96
 14:     30448.348:  7.04102  15.6489  14.0493 -11177.6  3277.46 -1877.96
 15:     30448.348:  7.38170  15.9919  14.4073 -11177.6  3277.46 -1877.96
 16:     30448.348:  7.72237  16.3350  14.7652 -11177.6  3277.46 -1877.96
 17:     30448.348:  8.06305  16.6781  15.1232 -11177.6  3277.46 -1877.96
 18:     30448.348:  8.40373  17.0211  15.4811 -11177.6  3277.46 -1877.96
 19:     30448.348:  8.74441  17.3642  15.8391 -11177.6  3277.46 -1877.96
 20:     30448.348:  9.08508  17.7073  16.1971 -11177.6  3277.46 -1877.96
 21:     30448.348:  9.42576  18.0503  16.5550 -11177.6  3277.46 -1877.96
  0:     30108.384:  9.42573  18.0503  16.5550 -11183.6  3279.09 -1879.43
  1:     30108.384:  9.42568  18.0503  16.5550 -11183.6  3279.09 -1879.43
  2:     30108.384:  9.42544  18.0502  16.5548 -11183.6  3279.09 -1879.43
  0:     30108.056:  9.42539  18.0502  16.5548 -11168.0  3239.13 -1879.51
  1:     30108.056:  9.42526  18.0502  16.5549 -11168.0  3239.13 -1879.51
  0:     30108.055:  9.42523  18.0502  16.5549 -11150.0  3223.70 -1879.28
  1:     30108.055:  9.42523  18.0502  16.5549 -11150.0  3223.70 -1879.28
> baseModel.naive
Nonlinear mixed-effects model fit by maximum likelihood
  Model: new_cases ~ bellcurve.model(d, mu, var, x = day) 
  Data: dat 
  Log-likelihood: -23451.37
  Fixed: list(d ~ 1, mu ~ 1, var ~ 1) 
         d         mu        var 
4290.47415   35.32178   38.44048 

Random effects:
 Formula: list(d ~ 1, mu ~ 1, var ~ 1)
 Level: Country.Region
 Structure: General positive-definite, Log-Cholesky parametrization
         StdDev       Corr   
d        1.402353e-01 d    mu
mu       2.518250e-05 0      
var      1.123208e-04 0    0 
Residual 1.738523e+03        

Number of Observations: 2641
Number of Groups: 126  

但也许这还不够。看到带有 0 的 Corr 了吗?似乎有些不对劲。所以我从 Roland ( ) 那里获取了一个页面,并决定获得一个类似于你的 bellcurve.model.
的自启动功能 我还尝试将数据子集化为具有最少天数的 Country.Regions,以防出现问题。我在这里分节详细介绍我的 results/code,最后(在附录中)将它们放在一起以便快速复制和粘贴。

预备知识和数据探索

为了探索事物,我尝试将数据限制在具有最少天数的国家/地区。我选择了 45 天(5 个国家)并慢慢增加到 1 天(完整数据集)。我用 data.table 来计算和显示相关的东西。

library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)


# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
  f <- d*exp(-((x - mu)^2) / (2*var))
  return(f)
}


dim(dat)
head(dat)

## how many countries?
dat[,uniqueN(Country.Region)]

## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]


minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))

从 nls() 和数据中获取自动起始值

这就是 Roland's answer in a related post 发挥作用的地方。我们需要一种自动化的方式来获取通知的起始值,一种方法是使用 nls() 和自启动函数。您可以从 bellcurve.model 中创建一个自启动函数,但我发现 SSbell 与此类似,因此决定使用它。我 运行 nls()SSbell 并得到具有 ymaxSSbell 公式的起始值(对应于 bellcurve.modeld ) 和 xc(对应于 bellcurve.model 中的 mu)。对于 bellcurve.model 中的 var 起始值,我首先计算每个 Country.Region 的方差,然后选择一个较小的方差,选择 20%-tile 因为这适用于 minimum_days<-45minimum_days<-1(完整数据)。

## use this approach 
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)

## calculate a starting value for var:  the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))

##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start

再次 - 我不得不玩一点 - 但如果你采用 20% 的经验方差,nlme 调用 bellcurve.model 代码将 运行对于 minimum_days <- 45 以及 minimum_days <- 1(完整数据集)。通过计算我们的起始值并可能限制我们的数据集,我们拟合了两个 nlme 模型:一个使用 SSbell,另一个使用 bellcurve.model。两者都将 运行 用于 minimum_days<-45,只有 bellcurve.model 将收敛于 minimum_days<-1(完整数据集)。幸运!

两次 nlme 调用:一次使用 SSbell,另一个使用 bellcurve.model

# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc), 
                         data = dat[N>=minimum_days], 
                         fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
                         random = ymax + a + b + xc ~ 1|Country.Region,
                         start = list(fixed = c(    coef(nls_starting_values) )),
                         na.action = na.omit,
                         control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc   for mu;
# NLME Model: using bellcurve.model and                    var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
                            data = dat[N>=minimum_days], 
                            fixed = list(d ~ 1, mu ~ 1, var ~ 1),
                            random = d + mu + var ~ 1|Country.Region,
                            start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
                            na.action = na.omit,
                            control=list(maxIter=1e6, 
msMaxIter = 1e6, 
msVerbose = TRUE)
)

比较输出

baseModel.ssbell
baseModel.bellcurve

显示 minimum_days<-45 的输出显示相似的拟合(查看可能性)。

## 5 countries DATA minimum_days <- 45
> baseModel.ssbell
Nonlinear mixed-effects model fit by maximum likelihood
  Model: new_cases ~ SSbell(day, ymax, a, b, xc) 
  Data: dat[N >= minimum_days] 
  Log-likelihood: -2264.706
  Fixed: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1) 
         ymax             a             b            xc 
 1.026599e+03 -1.164625e-02 -1.626079e-04  1.288924e+01 

Random effects:
 Formula: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
 Level: Country.Region
 Structure: General positive-definite, Log-Cholesky parametrization
         StdDev       Corr                
ymax     1.731582e+03 ymax   a      b     
a        4.467475e-06 -0.999              
b        1.016493e-04 -0.998  0.999       
xc       3.528238e+00  1.000 -0.999 -0.999
Residual 8.045770e+02                     

Number of Observations: 278
Number of Groups: 5 
## 5 countries DATA minimum_days <- 45
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
  Model: new_cases ~ bellcurve.model(d, mu, var, x = day) 
  Data: dat[N >= minimum_days] 
  Log-likelihood: -2267.406
  Fixed: list(d ~ 1, mu ~ 1, var ~ 1) 
        d        mu       var 
965.81525  12.73549  58.22049 

Random effects:
 Formula: list(d ~ 1, mu ~ 1, var ~ 1)
 Level: Country.Region
 Structure: General positive-definite, Log-Cholesky parametrization
         StdDev       Corr       
d        1.633432e+03 d     mu   
mu       2.815230e+00  1.00      
var      5.582379e-03 -0.01 -0.01
Residual 8.127932e+02            

Number of Observations: 278
Number of Groups: 5 

显示 minimum_days<-1baseModel.bellcurve 的输出(完整数据集)显示了 baseModel.naive 的改进,我盲目和任意地摆弄起始值的唯一目的是摆脱错误和警告。

## FULL DATA minimum_days <- 1
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
  Model: new_cases ~ bellcurve.model(d, mu, var, x = day) 
  Data: dat[N >= minimum_days] 
  Log-likelihood: -20487.86
  Fixed: list(d ~ 1, mu ~ 1, var ~ 1) 
         d         mu        var 
1044.52101   25.00288   81.79004 

Random effects:
 Formula: list(d ~ 1, mu ~ 1, var ~ 1)
 Level: Country.Region
 Structure: General positive-definite, Log-Cholesky parametrization
         StdDev       Corr       
d        4.285702e+03 d     mu   
mu       5.423452e+00 0.545      
var      3.008379e-03 0.028 0.050
Residual 4.985698e+02            

Number of Observations: 2641
Number of Groups: 126 

Log-likelihood: -20487.86 for baseModel.bellcurve vs Log-likelihood: -23451.37 for baseModel.naive baseModel.bellcurve 中的 Corr 矩阵看起来也更好。

附录:一次抓取代码。



library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)


# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
  f <- d*exp(-((x - mu)^2) / (2*var))
  return(f)
}


dim(dat)
head(dat)

## how many countries?
dat[,uniqueN(Country.Region)]

## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]


minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))



## use this approach 
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)

## calculate a starting value for var:  the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))

##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start




# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc), 
                         data = dat[N>=minimum_days], 
                         fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
                         random = ymax + a + b + xc ~ 1|Country.Region,
                         start = list(fixed = c(    coef(nls_starting_values) )),
                         na.action = na.omit,
                         control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)


# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc   for mu;
# NLME Model: using bellcurve.model and                    var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
                            data = dat[N>=minimum_days], 
                            fixed = list(d ~ 1, mu ~ 1, var ~ 1),
                            random = d + mu + var ~ 1|Country.Region,
                            start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
                            na.action = na.omit,
                            control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)


baseModel.ssbell
baseModel.bellcurve