R 中非线性混合效应(使用 nlme 包)的不稳定性
Instability of nonlinear mixed effects (using nlme package) in R
我正在尝试为 COVID-19 数据构建一个 nonlinear mixed effects model
数据,该数据符合来自不同国家的每日病例数的钟形曲线(随机效应在国家层面)。
数据 table 太大,无法包含在 post 中,但这里是 dataframe 的结构:
> str(dat)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2642 obs. of 7 variables:
$ Country.Region: Factor w/ 181 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
$ date : Date, format: "2020-03-24" "2020-03-25" "2020-03-26" "2020-03-27" ...
$ day : num 0 1 2 3 4 5 6 7 8 9 ...
$ total_cases : int 74 84 94 110 110 120 170 174 237 273 ...
$ new_cases : int 34 10 10 16 0 10 50 4 63 36 ...
$ total_deaths : int 1 2 4 4 4 4 4 4 4 6 ...
$ new_deaths : int 0 1 2 0 0 0 0 0 0 2 ...
尝试拟合模型:
library(nlme)
dat <- readRDS("dat.rds")
# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
f <- d*exp(-((x - mu)^2) / (2*var))
return(f)
}
# NLME Model
baseModel <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat,
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(1000, 20, 20)),
na.action = na.omit)
然而,这是由此产生的错误:
Error in nlme.formula(new_cases ~ bellcurve.model(d, mu, var, x = day), :
Singularity in backsolve at level 0, block 1
我读过其他 Whosebug posts 表明某些协变量可能在模型中混淆,但我在这里用来预测 new_cases
的唯一协变量是 day
。任何关于这意味着什么的建议或调试技巧将不胜感激,尤其是任何关于如何解决这个问题的建议。
tl;dr:这种类型的建模对起始值很敏感。我详细说明了如何成功解决该问题。有助于此修复的三个主要内容是:
- 增加最大迭代和函数调用
- 获取知情起始值
- 可能 restrict/subset 最初添加的国家/地区的数据具有较少的可用信息
简要概述:
我只是通过盲目地更改起始值来开始修补。将 c(1000, 20, 20)
更改为 c(20000, 10, 10)
我将模型变为 运行 error/warning 免费 baseModel.naive
和 Log-likelihood: -23451.37
。使用增加的最大值进行迭代和函数调用并获取知情的起始值使模型能够显着改进 baseModel.bellcurve
报告 Log-likelihood: -20487.86
。
天真地改变起始值将消除错误
我调整了起始值。此外,我还展示了如何增加函数调用和迭代的最大数量以及如何使用详细选项。使用 ?nlme
和 ?nlmeControl
可以找到更多信息。根据我的经验,这些类型的模型可能对起始值、最大迭代次数和函数调用很敏感。
baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat,
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(20000, 10, 10)),
na.action = na.omit,
control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
baseModel.naive
输出:
> baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
+ data = dat,
+ fixed = list(d ~ 1, mu ~ 1, var ~ 1),
+ random = d + mu + var ~ 1|Country.Region,
+ start = list(fixed = c(20000, 10, 10)),
+ na.action = na.omit,
+ control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
0: 30455.774: 2.23045 10.6880 8.80935 -11177.6 3277.46 -1877.96
1: 30450.763: 2.80826 11.2726 9.37889 -11177.6 3277.46 -1877.96
2: 30449.789: 2.94771 11.5283 9.80691 -11177.6 3277.46 -1877.96
3: 30449.150: 3.30454 11.8277 10.0329 -11177.6 3277.46 -1877.96
4: 30448.727: 3.64209 12.2237 10.4571 -11177.6 3277.46 -1877.96
5: 30448.540: 3.97875 12.5637 10.8217 -11177.6 3277.46 -1877.96
6: 30448.445: 4.31754 12.9055 11.1826 -11177.6 3277.46 -1877.96
7: 30448.397: 4.65727 13.2480 11.5420 -11177.6 3277.46 -1877.96
8: 30448.372: 4.99746 13.5908 11.9008 -11177.6 3277.46 -1877.96
9: 30448.360: 5.33789 13.9337 12.2591 -11177.6 3277.46 -1877.96
10: 30448.354: 5.67844 14.2767 12.6173 -11177.6 3277.46 -1877.96
11: 30448.351: 6.01905 14.6197 12.9753 -11177.6 3277.46 -1877.96
12: 30448.349: 6.35969 14.9627 13.3333 -11177.6 3277.46 -1877.96
13: 30448.349: 6.70035 15.3058 13.6913 -11177.6 3277.46 -1877.96
14: 30448.348: 7.04102 15.6489 14.0493 -11177.6 3277.46 -1877.96
15: 30448.348: 7.38170 15.9919 14.4073 -11177.6 3277.46 -1877.96
16: 30448.348: 7.72237 16.3350 14.7652 -11177.6 3277.46 -1877.96
17: 30448.348: 8.06305 16.6781 15.1232 -11177.6 3277.46 -1877.96
18: 30448.348: 8.40373 17.0211 15.4811 -11177.6 3277.46 -1877.96
19: 30448.348: 8.74441 17.3642 15.8391 -11177.6 3277.46 -1877.96
20: 30448.348: 9.08508 17.7073 16.1971 -11177.6 3277.46 -1877.96
21: 30448.348: 9.42576 18.0503 16.5550 -11177.6 3277.46 -1877.96
0: 30108.384: 9.42573 18.0503 16.5550 -11183.6 3279.09 -1879.43
1: 30108.384: 9.42568 18.0503 16.5550 -11183.6 3279.09 -1879.43
2: 30108.384: 9.42544 18.0502 16.5548 -11183.6 3279.09 -1879.43
0: 30108.056: 9.42539 18.0502 16.5548 -11168.0 3239.13 -1879.51
1: 30108.056: 9.42526 18.0502 16.5549 -11168.0 3239.13 -1879.51
0: 30108.055: 9.42523 18.0502 16.5549 -11150.0 3223.70 -1879.28
1: 30108.055: 9.42523 18.0502 16.5549 -11150.0 3223.70 -1879.28
> baseModel.naive
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat
Log-likelihood: -23451.37
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
4290.47415 35.32178 38.44048
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 1.402353e-01 d mu
mu 2.518250e-05 0
var 1.123208e-04 0 0
Residual 1.738523e+03
Number of Observations: 2641
Number of Groups: 126
但也许这还不够。看到带有 0 的 Corr
了吗?似乎有些不对劲。所以我从 Roland ( ) 那里获取了一个页面,并决定获得一个类似于你的 bellcurve.model
.
的自启动功能
我还尝试将数据子集化为具有最少天数的 Country.Region
s,以防出现问题。我在这里分节详细介绍我的 results/code,最后(在附录中)将它们放在一起以便快速复制和粘贴。
预备知识和数据探索
为了探索事物,我尝试将数据限制在具有最少天数的国家/地区。我选择了 45 天(5 个国家)并慢慢增加到 1 天(完整数据集)。我用 data.table
来计算和显示相关的东西。
library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)
# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
f <- d*exp(-((x - mu)^2) / (2*var))
return(f)
}
dim(dat)
head(dat)
## how many countries?
dat[,uniqueN(Country.Region)]
## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]
minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))
从 nls() 和数据中获取自动起始值
这就是 Roland's answer in a related post 发挥作用的地方。我们需要一种自动化的方式来获取通知的起始值,一种方法是使用 nls()
和自启动函数。您可以从 bellcurve.model
中创建一个自启动函数,但我发现 SSbell
与此类似,因此决定使用它。我 运行 nls()
和 SSbell
并得到具有 ymax
的 SSbell
公式的起始值(对应于 bellcurve.model
的 d
) 和 xc
(对应于 bellcurve.model
中的 mu
)。对于 bellcurve.model
中的 var
起始值,我首先计算每个 Country.Region
的方差,然后选择一个较小的方差,选择 20%-tile 因为这适用于 minimum_days<-45
和 minimum_days<-1
(完整数据)。
## use this approach
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)
## calculate a starting value for var: the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))
##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start
再次 - 我不得不玩一点 - 但如果你采用 20% 的经验方差,nlme
调用 bellcurve.model
代码将 运行对于 minimum_days <- 45
以及 minimum_days <- 1
(完整数据集)。通过计算我们的起始值并可能限制我们的数据集,我们拟合了两个 nlme
模型:一个使用 SSbell
,另一个使用 bellcurve.model
。两者都将 运行 用于 minimum_days<-45
,只有 bellcurve.model
将收敛于 minimum_days<-1
(完整数据集)。幸运!
两次 nlme 调用:一次使用 SSbell
,另一个使用 bellcurve.model
# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc),
data = dat[N>=minimum_days],
fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
random = ymax + a + b + xc ~ 1|Country.Region,
start = list(fixed = c( coef(nls_starting_values) )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc for mu;
# NLME Model: using bellcurve.model and var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat[N>=minimum_days],
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
na.action = na.omit,
control=list(maxIter=1e6,
msMaxIter = 1e6,
msVerbose = TRUE)
)
比较输出
baseModel.ssbell
baseModel.bellcurve
显示 minimum_days<-45
的输出显示相似的拟合(查看可能性)。
## 5 countries DATA minimum_days <- 45
> baseModel.ssbell
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ SSbell(day, ymax, a, b, xc)
Data: dat[N >= minimum_days]
Log-likelihood: -2264.706
Fixed: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
ymax a b xc
1.026599e+03 -1.164625e-02 -1.626079e-04 1.288924e+01
Random effects:
Formula: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
ymax 1.731582e+03 ymax a b
a 4.467475e-06 -0.999
b 1.016493e-04 -0.998 0.999
xc 3.528238e+00 1.000 -0.999 -0.999
Residual 8.045770e+02
Number of Observations: 278
Number of Groups: 5
## 5 countries DATA minimum_days <- 45
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat[N >= minimum_days]
Log-likelihood: -2267.406
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
965.81525 12.73549 58.22049
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 1.633432e+03 d mu
mu 2.815230e+00 1.00
var 5.582379e-03 -0.01 -0.01
Residual 8.127932e+02
Number of Observations: 278
Number of Groups: 5
显示 minimum_days<-1
的 baseModel.bellcurve
的输出(完整数据集)显示了 baseModel.naive
的改进,我盲目和任意地摆弄起始值的唯一目的是摆脱错误和警告。
## FULL DATA minimum_days <- 1
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat[N >= minimum_days]
Log-likelihood: -20487.86
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
1044.52101 25.00288 81.79004
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 4.285702e+03 d mu
mu 5.423452e+00 0.545
var 3.008379e-03 0.028 0.050
Residual 4.985698e+02
Number of Observations: 2641
Number of Groups: 126
Log-likelihood: -20487.86
for baseModel.bellcurve
vs Log-likelihood: -23451.37
for baseModel.naive
baseModel.bellcurve
中的 Corr 矩阵看起来也更好。
附录:一次抓取代码。
library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)
# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
f <- d*exp(-((x - mu)^2) / (2*var))
return(f)
}
dim(dat)
head(dat)
## how many countries?
dat[,uniqueN(Country.Region)]
## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]
minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))
## use this approach
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)
## calculate a starting value for var: the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))
##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start
# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc),
data = dat[N>=minimum_days],
fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
random = ymax + a + b + xc ~ 1|Country.Region,
start = list(fixed = c( coef(nls_starting_values) )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc for mu;
# NLME Model: using bellcurve.model and var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat[N>=minimum_days],
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
baseModel.ssbell
baseModel.bellcurve
我正在尝试为 COVID-19 数据构建一个 nonlinear mixed effects model
数据,该数据符合来自不同国家的每日病例数的钟形曲线(随机效应在国家层面)。
数据 table 太大,无法包含在 post 中,但这里是 dataframe 的结构:
> str(dat)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2642 obs. of 7 variables:
$ Country.Region: Factor w/ 181 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
$ date : Date, format: "2020-03-24" "2020-03-25" "2020-03-26" "2020-03-27" ...
$ day : num 0 1 2 3 4 5 6 7 8 9 ...
$ total_cases : int 74 84 94 110 110 120 170 174 237 273 ...
$ new_cases : int 34 10 10 16 0 10 50 4 63 36 ...
$ total_deaths : int 1 2 4 4 4 4 4 4 4 6 ...
$ new_deaths : int 0 1 2 0 0 0 0 0 0 2 ...
尝试拟合模型:
library(nlme)
dat <- readRDS("dat.rds")
# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
f <- d*exp(-((x - mu)^2) / (2*var))
return(f)
}
# NLME Model
baseModel <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat,
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(1000, 20, 20)),
na.action = na.omit)
然而,这是由此产生的错误:
Error in nlme.formula(new_cases ~ bellcurve.model(d, mu, var, x = day), : Singularity in backsolve at level 0, block 1
我读过其他 Whosebug posts 表明某些协变量可能在模型中混淆,但我在这里用来预测 new_cases
的唯一协变量是 day
。任何关于这意味着什么的建议或调试技巧将不胜感激,尤其是任何关于如何解决这个问题的建议。
tl;dr:这种类型的建模对起始值很敏感。我详细说明了如何成功解决该问题。有助于此修复的三个主要内容是:
- 增加最大迭代和函数调用
- 获取知情起始值
- 可能 restrict/subset 最初添加的国家/地区的数据具有较少的可用信息
简要概述:
我只是通过盲目地更改起始值来开始修补。将 c(1000, 20, 20)
更改为 c(20000, 10, 10)
我将模型变为 运行 error/warning 免费 baseModel.naive
和 Log-likelihood: -23451.37
。使用增加的最大值进行迭代和函数调用并获取知情的起始值使模型能够显着改进 baseModel.bellcurve
报告 Log-likelihood: -20487.86
。
天真地改变起始值将消除错误
我调整了起始值。此外,我还展示了如何增加函数调用和迭代的最大数量以及如何使用详细选项。使用 ?nlme
和 ?nlmeControl
可以找到更多信息。根据我的经验,这些类型的模型可能对起始值、最大迭代次数和函数调用很敏感。
baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat,
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(20000, 10, 10)),
na.action = na.omit,
control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
baseModel.naive
输出:
> baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
+ data = dat,
+ fixed = list(d ~ 1, mu ~ 1, var ~ 1),
+ random = d + mu + var ~ 1|Country.Region,
+ start = list(fixed = c(20000, 10, 10)),
+ na.action = na.omit,
+ control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
0: 30455.774: 2.23045 10.6880 8.80935 -11177.6 3277.46 -1877.96
1: 30450.763: 2.80826 11.2726 9.37889 -11177.6 3277.46 -1877.96
2: 30449.789: 2.94771 11.5283 9.80691 -11177.6 3277.46 -1877.96
3: 30449.150: 3.30454 11.8277 10.0329 -11177.6 3277.46 -1877.96
4: 30448.727: 3.64209 12.2237 10.4571 -11177.6 3277.46 -1877.96
5: 30448.540: 3.97875 12.5637 10.8217 -11177.6 3277.46 -1877.96
6: 30448.445: 4.31754 12.9055 11.1826 -11177.6 3277.46 -1877.96
7: 30448.397: 4.65727 13.2480 11.5420 -11177.6 3277.46 -1877.96
8: 30448.372: 4.99746 13.5908 11.9008 -11177.6 3277.46 -1877.96
9: 30448.360: 5.33789 13.9337 12.2591 -11177.6 3277.46 -1877.96
10: 30448.354: 5.67844 14.2767 12.6173 -11177.6 3277.46 -1877.96
11: 30448.351: 6.01905 14.6197 12.9753 -11177.6 3277.46 -1877.96
12: 30448.349: 6.35969 14.9627 13.3333 -11177.6 3277.46 -1877.96
13: 30448.349: 6.70035 15.3058 13.6913 -11177.6 3277.46 -1877.96
14: 30448.348: 7.04102 15.6489 14.0493 -11177.6 3277.46 -1877.96
15: 30448.348: 7.38170 15.9919 14.4073 -11177.6 3277.46 -1877.96
16: 30448.348: 7.72237 16.3350 14.7652 -11177.6 3277.46 -1877.96
17: 30448.348: 8.06305 16.6781 15.1232 -11177.6 3277.46 -1877.96
18: 30448.348: 8.40373 17.0211 15.4811 -11177.6 3277.46 -1877.96
19: 30448.348: 8.74441 17.3642 15.8391 -11177.6 3277.46 -1877.96
20: 30448.348: 9.08508 17.7073 16.1971 -11177.6 3277.46 -1877.96
21: 30448.348: 9.42576 18.0503 16.5550 -11177.6 3277.46 -1877.96
0: 30108.384: 9.42573 18.0503 16.5550 -11183.6 3279.09 -1879.43
1: 30108.384: 9.42568 18.0503 16.5550 -11183.6 3279.09 -1879.43
2: 30108.384: 9.42544 18.0502 16.5548 -11183.6 3279.09 -1879.43
0: 30108.056: 9.42539 18.0502 16.5548 -11168.0 3239.13 -1879.51
1: 30108.056: 9.42526 18.0502 16.5549 -11168.0 3239.13 -1879.51
0: 30108.055: 9.42523 18.0502 16.5549 -11150.0 3223.70 -1879.28
1: 30108.055: 9.42523 18.0502 16.5549 -11150.0 3223.70 -1879.28
> baseModel.naive
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat
Log-likelihood: -23451.37
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
4290.47415 35.32178 38.44048
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 1.402353e-01 d mu
mu 2.518250e-05 0
var 1.123208e-04 0 0
Residual 1.738523e+03
Number of Observations: 2641
Number of Groups: 126
但也许这还不够。看到带有 0 的 Corr
了吗?似乎有些不对劲。所以我从 Roland ( ) 那里获取了一个页面,并决定获得一个类似于你的 bellcurve.model
.
的自启动功能
我还尝试将数据子集化为具有最少天数的 Country.Region
s,以防出现问题。我在这里分节详细介绍我的 results/code,最后(在附录中)将它们放在一起以便快速复制和粘贴。
预备知识和数据探索
为了探索事物,我尝试将数据限制在具有最少天数的国家/地区。我选择了 45 天(5 个国家)并慢慢增加到 1 天(完整数据集)。我用 data.table
来计算和显示相关的东西。
library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)
# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
f <- d*exp(-((x - mu)^2) / (2*var))
return(f)
}
dim(dat)
head(dat)
## how many countries?
dat[,uniqueN(Country.Region)]
## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]
minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))
从 nls() 和数据中获取自动起始值
这就是 Roland's answer in a related post 发挥作用的地方。我们需要一种自动化的方式来获取通知的起始值,一种方法是使用 nls()
和自启动函数。您可以从 bellcurve.model
中创建一个自启动函数,但我发现 SSbell
与此类似,因此决定使用它。我 运行 nls()
和 SSbell
并得到具有 ymax
的 SSbell
公式的起始值(对应于 bellcurve.model
的 d
) 和 xc
(对应于 bellcurve.model
中的 mu
)。对于 bellcurve.model
中的 var
起始值,我首先计算每个 Country.Region
的方差,然后选择一个较小的方差,选择 20%-tile 因为这适用于 minimum_days<-45
和 minimum_days<-1
(完整数据)。
## use this approach
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)
## calculate a starting value for var: the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))
##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start
再次 - 我不得不玩一点 - 但如果你采用 20% 的经验方差,nlme
调用 bellcurve.model
代码将 运行对于 minimum_days <- 45
以及 minimum_days <- 1
(完整数据集)。通过计算我们的起始值并可能限制我们的数据集,我们拟合了两个 nlme
模型:一个使用 SSbell
,另一个使用 bellcurve.model
。两者都将 运行 用于 minimum_days<-45
,只有 bellcurve.model
将收敛于 minimum_days<-1
(完整数据集)。幸运!
两次 nlme 调用:一次使用 SSbell
,另一个使用 bellcurve.model
# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc),
data = dat[N>=minimum_days],
fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
random = ymax + a + b + xc ~ 1|Country.Region,
start = list(fixed = c( coef(nls_starting_values) )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc for mu;
# NLME Model: using bellcurve.model and var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat[N>=minimum_days],
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
na.action = na.omit,
control=list(maxIter=1e6,
msMaxIter = 1e6,
msVerbose = TRUE)
)
比较输出
baseModel.ssbell
baseModel.bellcurve
显示 minimum_days<-45
的输出显示相似的拟合(查看可能性)。
## 5 countries DATA minimum_days <- 45
> baseModel.ssbell
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ SSbell(day, ymax, a, b, xc)
Data: dat[N >= minimum_days]
Log-likelihood: -2264.706
Fixed: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
ymax a b xc
1.026599e+03 -1.164625e-02 -1.626079e-04 1.288924e+01
Random effects:
Formula: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
ymax 1.731582e+03 ymax a b
a 4.467475e-06 -0.999
b 1.016493e-04 -0.998 0.999
xc 3.528238e+00 1.000 -0.999 -0.999
Residual 8.045770e+02
Number of Observations: 278
Number of Groups: 5
## 5 countries DATA minimum_days <- 45
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat[N >= minimum_days]
Log-likelihood: -2267.406
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
965.81525 12.73549 58.22049
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 1.633432e+03 d mu
mu 2.815230e+00 1.00
var 5.582379e-03 -0.01 -0.01
Residual 8.127932e+02
Number of Observations: 278
Number of Groups: 5
显示 minimum_days<-1
的 baseModel.bellcurve
的输出(完整数据集)显示了 baseModel.naive
的改进,我盲目和任意地摆弄起始值的唯一目的是摆脱错误和警告。
## FULL DATA minimum_days <- 1
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat[N >= minimum_days]
Log-likelihood: -20487.86
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
1044.52101 25.00288 81.79004
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 4.285702e+03 d mu
mu 5.423452e+00 0.545
var 3.008379e-03 0.028 0.050
Residual 4.985698e+02
Number of Observations: 2641
Number of Groups: 126
Log-likelihood: -20487.86
for baseModel.bellcurve
vs Log-likelihood: -23451.37
for baseModel.naive
baseModel.bellcurve
中的 Corr 矩阵看起来也更好。
附录:一次抓取代码。
library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)
# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
f <- d*exp(-((x - mu)^2) / (2*var))
return(f)
}
dim(dat)
head(dat)
## how many countries?
dat[,uniqueN(Country.Region)]
## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]
minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))
## use this approach
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)
## calculate a starting value for var: the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))
##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start
# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc),
data = dat[N>=minimum_days],
fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
random = ymax + a + b + xc ~ 1|Country.Region,
start = list(fixed = c( coef(nls_starting_values) )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc for mu;
# NLME Model: using bellcurve.model and var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat[N>=minimum_days],
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
baseModel.ssbell
baseModel.bellcurve