具有 1 级和 2 级变量的纵向数据的 R 命令结构

Question

我有不同年份的儿童考试成绩和人口统计数据（纵向数据），需要运行几个比较模型。我对如何在 R.

中设置 1 级和 2 级变量感到困惑

我的数据框（df）：

Student  Year  Gender Race MathScore DepressionScore MemoryScore
1        1999   M      C     80            15            80
1        2000   M      C     81            25            60
1        2001   M      C     70            50            75
2        1999   F      C     65            15            99
2        2000   F      C     70            31            98
2        2001   F      C     71            30            99
3        1999   F      AA    92            10            90
3        2000   F      AA    89            10            91
3        2001   F      AA    85            26            80

我想要运行至少两个模型并进行比较，但我不确定如何将时变协变量与时变协变量分开。我试过这些：

summary(fix <-lme(MathScore ~ Gender+Race+DepressionScore+MemoryScore, random= Year|Student, data=df, na.action="na.omit")

summary(fix2 <- lme(MathScore ~ 1+Gender+Race+DepressionScore+MemoryScore, random=~1|Year, data=df, na.action=na.omit))

我的问题是： 1. 在 "fix" 中所有协变量都应该遵循第一个 tilda 并且 random~ be Year|Student?

我如何指定 depressionscore 和 memoryscore 也因年份和学生而异？
fix2 应该有“random=~1+Student|Year" or just "random=~1|Year”吗？

Answer 1

你有 years 嵌套在 students 中，所以随机截取模型的命令 should be:

summary(fix <-lme(MathScore ~ Gender+Race+DepressionScore+MemoryScore, random= ~1|Student/Years, data=df, na.action="na.omit")

要分离随时间变化和随时间变化的影响，您需要在随时间变化的因素中分离估计值（请参阅：Fairbrother, M., 2014. 用于分析比较纵向调查数据集的两种多级建模技术。政治学研究与方法 2, 119–140. doi:10.1017/psrm.2013.24)

这需要学生将时变变量居中，估计持续'student'效应的影响，并从原始变量中减去它们，分离出时变部分。如果没有数据集，我不确定这是否有效，但请尝试

ddply(dat, "Student", transform, mean.std.DepressionScore  = mean(DepressionScore))
ddply(dat, "Student", transform, mean.std.MemoryScore= mean(MemoryScore))

df$time.DepressionScore <- df$DepressionScore-df$mean.std.DepressionScore
df$time.MemoryScore<- df$MemoryScore-df$mean.std.MemoryScore

则模型变为：

summary(fix <-lme(MathScore ~ Gender+Race+mean.std.DepressionScore+time.DepressionScore+mean.std.MemoryScore+time.MemoryScore + Year, random= ~1|Year/Student, data=df, na.action="na.omit")

在此模型中，mean.std 值提供了 'between' 学生的时间持续差异的估计值，而时间。估计值是 'within' 学生随时间变化的衡量标准。您需要多年的 固定效应 估计，以控制可能同样影响持续时间和时变效应的趋势。

具有 1 级和 2 级变量的纵向数据的 R 命令结构

R command structure for Longitudinal data with level 1 and level 2 variables

r

lme4

longitudinal