在 R 中指定随机效应之间的非线性关系
Specifying a non-linear relation between random effects in R
我正在使用多级模型来尝试描述纵向变化的不同模式。 Dingemanse et al (2010) 描述随机效应完全相关时的 'fanning out' 模式。然而,我发现当随机效应之间的关系是非线性的但在观察到的时间间隔内单调增加时,会出现类似的模式。在这种情况下,随机效应不是完全相关的,而是用一个函数来描述的。
有关此说明,请参见下面的示例。该示例仍然具有很高的截距-斜率相关性 (>.9),但有可能获得低于 .7 的相关性,同时仍然保持完美的截距-斜率关系。
我的问题:有没有办法在使用 nlme
或其他 R 包的多级模型中包含随机效应之间的完美(非线性)关系? MLwiN 有一种方法可以约束斜率截距协方差,这将是一个开始,但不足以包含非线性关系。到目前为止,我一直无法找到 nlme
的解决方案,但也许您知道可以将其包含在模型中的其他一些包。
为草率的编码道歉。我希望我的问题足够清楚,但如果有任何需要澄清的地方,请告诉我。非常感谢任何帮助或替代解决方案。
set.seed(123456)
# Change function, quadratic
# Yit = B0ij + B1ij*time + B2ij*time^2
chn <- function(int, slp, slp2, time){
score<-int + slp * time+ slp2 * time^2
return(score)
}
# Set N, random intercept, time and ID
N<-100
start<-rnorm(N,100,15) # Random intercept
time<- matrix(1:15,ncol = 15, nrow = 100,byrow = T) # Time, balanced panel data
ID<-1:N # ID variable
# Random intercept, linear slope: exp(intercept/25)/75, quadratic slope: exp(intercept/25)/250
score3<-matrix(NA,ncol = ncol(time), nrow = N)
for(x in ID){
score3[x,]<-chn(start[x],exp(start[x]/25)/75,exp(start[x]/25)/250,time[x,])
}
#Create dataframe
df<- data.frame(ID,score3)
df2<- melt(df,id = 'ID')
df2$variable<-as.vector(time)
# plot
ggplot(df2, aes(x= variable, y= value)) + geom_line(aes(group = ID)) +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se =F, size = 2, col ="red" )
# Add noise and estimate model
df2$value2<-df2$value + rnorm(N*ncol(time),0,2)
# Random intercept
mod1<-lme(value2 ~ variable + I(variable^2),
random= list(ID = ~1),
data=df2,method="ML",na.action=na.exclude)
summary(mod1)
# Random slopes
mod2<-update(mod1,.~.,random= list(ID = ~variable + I(variable^2)))
summary(mod2)
pairs(ranef(mod2))
根据@MattTyers 的建议,我尝试使用 rjags
的贝叶斯方法。这是对随机效应之间已知关系的模拟数据的第一次尝试,但似乎产生了准确的估计(比 nlme
模型更好)。我仍然有点担心 Gelman 收敛诊断以及如何将此解决方案应用于实际数据。但是,我想我会 post 我的答案,以防有人正在处理同样的问题。
# BAYESIAN ESTIMATE
library(ggplot2); library(reshape2)
# Set new dataset
set.seed(12345)
# New dataset to separate random and fixed
N<-100 # Number of respondents
int<-100 # Fixed effect intercept
U0<-rnorm(N,0,15) # Random effect intercept
slp_lin<-1 # Fixed effect linear slope
slp_qua<-.25 # Fixed effect quadratic slope
ID<- 1:100 # ID numbers
U1<-exp(U0/25)/7.5 # Random effect linear slope
U2<-exp(U0/25)/25 # Random effect quadratic slope
times<-15 # Max age
err <- matrix(rnorm(N*times,0,2),ncol = times, nrow = N) # Residual term
age <- 1:15 # Ages
# Create matrix of 'math' scores using model
math<-matrix(NA,ncol = times, nrow = N)
for(i in ID){
for(j in age){
math[i,j] <- (int + U0[i]) +
(slp_lin + U1[i])*age[j] +
(slp_qua + U2[i])*(age[j]^2) +
err[i,j]
}}
# Melt dataframe and plot scores
e.long<-melt(math)
names(e.long) <- c("ID","age","math")
ggplot(e.long,aes(x= age, y= math)) + geom_line(aes(group = ID))
# Create dataframe for rjags
dat<-list(math=as.numeric(e.long$math),
age=as.numeric(e.long$age),
childnum=e.long$ID,
n=length(e.long$math),
nkids=length(unique(e.long$ID)))
lapply(dat , summary)
library(rjags)
# Model with uninformative priors
model_rnk<-"
model{
#Model, fixed effect age and random intercept-slope connected
for( i in 1:n)
{
math[i]~dnorm(mu[i], sigm.inv)
mu[i]<-(b[1] + u[childnum[i],1]) + (b[2]+ u[childnum[i],2]) * age[i] +
(b[3]+ u[childnum[i],3]) * (age[i]^2)
}
#Random slopes
for (j in 1:nkids)
{
u[j, 1] ~ dnorm(0, tau.a)
u[j, 2] <- exp(u[j,1]/25)/7.5
u[j, 3] <- exp(u[j,1]/25)/25
}
#Priors on fixed intercept and slope priors
b[1] ~ dnorm(0.0, 1.0E-5)
b[2] ~ dnorm(0.0, 1.0E-5)
b[3] ~ dnorm(0.0, 1.0E-5)
# Residual variance priors
sigm.inv ~ dgamma(1.5, 0.001)# precision of math[i]
sigm<- pow(sigm.inv, -1/2) # standard deviation
# Varying intercepts, varying slopes priors
tau.a ~ dgamma(1.5, 0.001)
sigma.a<-pow(tau.a, -1/2)
}"
#Initialize the model
mod_rnk<-jags.model(file=textConnection(model_rnk), data=dat, n.chains=2 )
#burn in
update(mod_rnk, 5000)
#collect samples of the parameters
samps_rnk<-coda.samples(mod_rnk, variable.names=c( "b","sigma.a", "sigm"), n.iter=5000, n.thin=1)
#Numerical summary of each parameter:
summary(samps_rnk)
gelman.diag(samps_rnk, multivariate = F)
# nlme model
library(nlme)
Stab_rnk2<-lme(math ~ age + I(age^2),
random= list(ID = ~age + I(age^2)),
data=e.long,method="ML",na.action=na.exclude)
summary(Stab_rnk2)
结果看起来非常接近人口值
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
b[1] 100.7409 0.575414 5.754e-03 0.1065523
b[2] 0.9843 0.052248 5.225e-04 0.0064052
b[3] 0.2512 0.003144 3.144e-05 0.0003500
sigm 1.9963 0.037548 3.755e-04 0.0004056
sigma.a 16.9322 1.183173 1.183e-02 0.0121340
而且 nlme 估计更差(随机截距除外)
Random effects:
Formula: ~age + I(age^2) | ID
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 16.73626521 (Intr) age
age 0.13152688 0.890
I(age^2) 0.03752701 0.924 0.918
Residual 1.99346015
Fixed effects: math ~ age + I(age^2)
Value Std.Error DF t-value p-value
(Intercept) 103.85896 1.6847051 1398 61.64816 0
age 1.15741 0.0527874 1398 21.92586 0
I(age^2) 0.30809 0.0048747 1398 63.20204 0
我正在使用多级模型来尝试描述纵向变化的不同模式。 Dingemanse et al (2010) 描述随机效应完全相关时的 'fanning out' 模式。然而,我发现当随机效应之间的关系是非线性的但在观察到的时间间隔内单调增加时,会出现类似的模式。在这种情况下,随机效应不是完全相关的,而是用一个函数来描述的。 有关此说明,请参见下面的示例。该示例仍然具有很高的截距-斜率相关性 (>.9),但有可能获得低于 .7 的相关性,同时仍然保持完美的截距-斜率关系。
我的问题:有没有办法在使用 nlme
或其他 R 包的多级模型中包含随机效应之间的完美(非线性)关系? MLwiN 有一种方法可以约束斜率截距协方差,这将是一个开始,但不足以包含非线性关系。到目前为止,我一直无法找到 nlme
的解决方案,但也许您知道可以将其包含在模型中的其他一些包。
为草率的编码道歉。我希望我的问题足够清楚,但如果有任何需要澄清的地方,请告诉我。非常感谢任何帮助或替代解决方案。
set.seed(123456)
# Change function, quadratic
# Yit = B0ij + B1ij*time + B2ij*time^2
chn <- function(int, slp, slp2, time){
score<-int + slp * time+ slp2 * time^2
return(score)
}
# Set N, random intercept, time and ID
N<-100
start<-rnorm(N,100,15) # Random intercept
time<- matrix(1:15,ncol = 15, nrow = 100,byrow = T) # Time, balanced panel data
ID<-1:N # ID variable
# Random intercept, linear slope: exp(intercept/25)/75, quadratic slope: exp(intercept/25)/250
score3<-matrix(NA,ncol = ncol(time), nrow = N)
for(x in ID){
score3[x,]<-chn(start[x],exp(start[x]/25)/75,exp(start[x]/25)/250,time[x,])
}
#Create dataframe
df<- data.frame(ID,score3)
df2<- melt(df,id = 'ID')
df2$variable<-as.vector(time)
# plot
ggplot(df2, aes(x= variable, y= value)) + geom_line(aes(group = ID)) +
geom_smooth(method = "lm", formula = y ~ x + I(x^2), se =F, size = 2, col ="red" )
# Add noise and estimate model
df2$value2<-df2$value + rnorm(N*ncol(time),0,2)
# Random intercept
mod1<-lme(value2 ~ variable + I(variable^2),
random= list(ID = ~1),
data=df2,method="ML",na.action=na.exclude)
summary(mod1)
# Random slopes
mod2<-update(mod1,.~.,random= list(ID = ~variable + I(variable^2)))
summary(mod2)
pairs(ranef(mod2))
根据@MattTyers 的建议,我尝试使用 rjags
的贝叶斯方法。这是对随机效应之间已知关系的模拟数据的第一次尝试,但似乎产生了准确的估计(比 nlme
模型更好)。我仍然有点担心 Gelman 收敛诊断以及如何将此解决方案应用于实际数据。但是,我想我会 post 我的答案,以防有人正在处理同样的问题。
# BAYESIAN ESTIMATE
library(ggplot2); library(reshape2)
# Set new dataset
set.seed(12345)
# New dataset to separate random and fixed
N<-100 # Number of respondents
int<-100 # Fixed effect intercept
U0<-rnorm(N,0,15) # Random effect intercept
slp_lin<-1 # Fixed effect linear slope
slp_qua<-.25 # Fixed effect quadratic slope
ID<- 1:100 # ID numbers
U1<-exp(U0/25)/7.5 # Random effect linear slope
U2<-exp(U0/25)/25 # Random effect quadratic slope
times<-15 # Max age
err <- matrix(rnorm(N*times,0,2),ncol = times, nrow = N) # Residual term
age <- 1:15 # Ages
# Create matrix of 'math' scores using model
math<-matrix(NA,ncol = times, nrow = N)
for(i in ID){
for(j in age){
math[i,j] <- (int + U0[i]) +
(slp_lin + U1[i])*age[j] +
(slp_qua + U2[i])*(age[j]^2) +
err[i,j]
}}
# Melt dataframe and plot scores
e.long<-melt(math)
names(e.long) <- c("ID","age","math")
ggplot(e.long,aes(x= age, y= math)) + geom_line(aes(group = ID))
# Create dataframe for rjags
dat<-list(math=as.numeric(e.long$math),
age=as.numeric(e.long$age),
childnum=e.long$ID,
n=length(e.long$math),
nkids=length(unique(e.long$ID)))
lapply(dat , summary)
library(rjags)
# Model with uninformative priors
model_rnk<-"
model{
#Model, fixed effect age and random intercept-slope connected
for( i in 1:n)
{
math[i]~dnorm(mu[i], sigm.inv)
mu[i]<-(b[1] + u[childnum[i],1]) + (b[2]+ u[childnum[i],2]) * age[i] +
(b[3]+ u[childnum[i],3]) * (age[i]^2)
}
#Random slopes
for (j in 1:nkids)
{
u[j, 1] ~ dnorm(0, tau.a)
u[j, 2] <- exp(u[j,1]/25)/7.5
u[j, 3] <- exp(u[j,1]/25)/25
}
#Priors on fixed intercept and slope priors
b[1] ~ dnorm(0.0, 1.0E-5)
b[2] ~ dnorm(0.0, 1.0E-5)
b[3] ~ dnorm(0.0, 1.0E-5)
# Residual variance priors
sigm.inv ~ dgamma(1.5, 0.001)# precision of math[i]
sigm<- pow(sigm.inv, -1/2) # standard deviation
# Varying intercepts, varying slopes priors
tau.a ~ dgamma(1.5, 0.001)
sigma.a<-pow(tau.a, -1/2)
}"
#Initialize the model
mod_rnk<-jags.model(file=textConnection(model_rnk), data=dat, n.chains=2 )
#burn in
update(mod_rnk, 5000)
#collect samples of the parameters
samps_rnk<-coda.samples(mod_rnk, variable.names=c( "b","sigma.a", "sigm"), n.iter=5000, n.thin=1)
#Numerical summary of each parameter:
summary(samps_rnk)
gelman.diag(samps_rnk, multivariate = F)
# nlme model
library(nlme)
Stab_rnk2<-lme(math ~ age + I(age^2),
random= list(ID = ~age + I(age^2)),
data=e.long,method="ML",na.action=na.exclude)
summary(Stab_rnk2)
结果看起来非常接近人口值
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
b[1] 100.7409 0.575414 5.754e-03 0.1065523
b[2] 0.9843 0.052248 5.225e-04 0.0064052
b[3] 0.2512 0.003144 3.144e-05 0.0003500
sigm 1.9963 0.037548 3.755e-04 0.0004056
sigma.a 16.9322 1.183173 1.183e-02 0.0121340
而且 nlme 估计更差(随机截距除外)
Random effects:
Formula: ~age + I(age^2) | ID
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 16.73626521 (Intr) age
age 0.13152688 0.890
I(age^2) 0.03752701 0.924 0.918
Residual 1.99346015
Fixed effects: math ~ age + I(age^2)
Value Std.Error DF t-value p-value
(Intercept) 103.85896 1.6847051 1398 61.64816 0
age 1.15741 0.0527874 1398 21.92586 0
I(age^2) 0.30809 0.0048747 1398 63.20204 0