具有两个随机效应的混合模型 - statsmodels
mixed-models with two random effects - statsmodels
import pandas as pd
import statsmodels.formula.api as smf
df = pd.read_csv('http://www.bodowinter.com/tutorial/politeness_data.csv')
df = df.drop(38)
在R
我会做:
lmer(frequency ~ attitude + (1|subject) + (1|scenario), data=df)
在 R
中给我:
Random effects:
Groups Name Variance Std.Dev.
scenario (Intercept) 219 14.80
subject (Intercept) 4015 63.36
Residual 646 25.42
Fixed effects:
Estimate Std. Error t value
(Intercept) 202.588 26.754 7.572
attitudepol -19.695 5.585 -3.527
我尝试对 statsmodels
做同样的事情:
model = smf.mixedlm("frequency ~ attitude", data=df, groups=df[["subject","scenario"]]).fit()
但是 model.summary()
给了我不同的输出:
Mixed Linear Model Regression Results
=======================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 2 Scale: 0.0000
Min. group size: 1 Likelihood: inf
Max. group size: 1 Converged: Yes
Mean group size: 1.0
-------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
-------------------------------------------------------
Intercept 204.500
attitude[T.pol] 8.800
groups RE 0.000
=======================================================
您的 smf.mixedlm
模型的 lmer
等价物如下所示:
lmer(frequency ~ attitude + (1 + attitude|subject) + (1 + attitude|scenario), data = df)
术语解释:
- 全局拦截(您可以使用
frequency ~ 0 + attitude + ...
禁用全局拦截)
- 固定效应的全局斜率
attitude
。
- A random intercept vor
subject
(即对于 subject
的每个水平,您都会得到与全局截距的偏差),以及与 attitude
内的固定效应斜率的偏差subject
的每个级别,允许随机截距和斜率之间的相关性。
scenario
的等效随机截距和斜率项。
请注意,如果您想允许随机截距和斜率自由变化(即强制截距和斜率之间的相关性为零),您必须将 (1 + attitude|subject)
替换为 (1|subject) + (0 + attitude|subject)
,类似地scenario
.
我能想到的半复制的唯一方法是简单地连接你的组。
df["grp"] = df["subject"].astype(str) + df["scenario"].astype(str)
model = smf.mixedlm("frequency ~ attitude", data=df, groups=df["grp"]).fit()
model.summary()
Out[87]:
<class 'statsmodels.iolib.summary2.Summary'>
"""
Mixed Linear Model Regression Results
==============================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 42 Scale: 615.6961
Min. group size: 1 Likelihood: -430.8261
Max. group size: 2 Converged: Yes
Mean group size: 2.0
--------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
--------------------------------------------------------------
Intercept 202.588 10.078 20.102 0.000 182.836 222.340
attitude[T.pol] -19.618 5.476 -3.582 0.000 -30.350 -8.885
groups RE 3650.021 50.224
==============================================================
"""
下面的代码重现了 R 结果。由于这是一个没有独立组的交叉模型,你需要把每个人都放在同一个组中,并使用方差分量指定随机效应。
import pandas as pd
import statsmodels.api as sm
df = pd.read_csv('http://www.bodowinter.com/uploads/1/2/9/3/129362560/politeness_data.csv')
df = df.dropna()
df["group"] = 1
vcf = {"scenario": "0 + C(scenario)", "subject": "0 + C(subject)"}
model = sm.MixedLM.from_formula("frequency ~ attitude", groups="group",
vc_formula=vcf, re_formula="0", data=df)
result = model.fit()
结果如下:
Mixed Linear Model Regression Results
==============================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 1 Scale: 646.0163
Min. group size: 83 Likelihood: -396.7268
Max. group size: 83 Converged: Yes
Mean group size: 83.0
--------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
--------------------------------------------------------------
Intercept 202.588 26.754 7.572 0.000 150.152 255.025
attitude[T.pol] -19.695 5.585 -3.526 0.000 -30.641 -8.748
scenario Var 218.991 6.476
subject Var 4014.616 104.614
==============================================================
import pandas as pd
import statsmodels.formula.api as smf
df = pd.read_csv('http://www.bodowinter.com/tutorial/politeness_data.csv')
df = df.drop(38)
在R
我会做:
lmer(frequency ~ attitude + (1|subject) + (1|scenario), data=df)
在 R
中给我:
Random effects:
Groups Name Variance Std.Dev.
scenario (Intercept) 219 14.80
subject (Intercept) 4015 63.36
Residual 646 25.42
Fixed effects:
Estimate Std. Error t value
(Intercept) 202.588 26.754 7.572
attitudepol -19.695 5.585 -3.527
我尝试对 statsmodels
做同样的事情:
model = smf.mixedlm("frequency ~ attitude", data=df, groups=df[["subject","scenario"]]).fit()
但是 model.summary()
给了我不同的输出:
Mixed Linear Model Regression Results
=======================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 2 Scale: 0.0000
Min. group size: 1 Likelihood: inf
Max. group size: 1 Converged: Yes
Mean group size: 1.0
-------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
-------------------------------------------------------
Intercept 204.500
attitude[T.pol] 8.800
groups RE 0.000
=======================================================
您的 smf.mixedlm
模型的 lmer
等价物如下所示:
lmer(frequency ~ attitude + (1 + attitude|subject) + (1 + attitude|scenario), data = df)
术语解释:
- 全局拦截(您可以使用
frequency ~ 0 + attitude + ...
禁用全局拦截) - 固定效应的全局斜率
attitude
。 - A random intercept vor
subject
(即对于subject
的每个水平,您都会得到与全局截距的偏差),以及与attitude
内的固定效应斜率的偏差subject
的每个级别,允许随机截距和斜率之间的相关性。 scenario
的等效随机截距和斜率项。
请注意,如果您想允许随机截距和斜率自由变化(即强制截距和斜率之间的相关性为零),您必须将 (1 + attitude|subject)
替换为 (1|subject) + (0 + attitude|subject)
,类似地scenario
.
我能想到的半复制的唯一方法是简单地连接你的组。
df["grp"] = df["subject"].astype(str) + df["scenario"].astype(str)
model = smf.mixedlm("frequency ~ attitude", data=df, groups=df["grp"]).fit()
model.summary()
Out[87]:
<class 'statsmodels.iolib.summary2.Summary'>
"""
Mixed Linear Model Regression Results
==============================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 42 Scale: 615.6961
Min. group size: 1 Likelihood: -430.8261
Max. group size: 2 Converged: Yes
Mean group size: 2.0
--------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
--------------------------------------------------------------
Intercept 202.588 10.078 20.102 0.000 182.836 222.340
attitude[T.pol] -19.618 5.476 -3.582 0.000 -30.350 -8.885
groups RE 3650.021 50.224
==============================================================
"""
下面的代码重现了 R 结果。由于这是一个没有独立组的交叉模型,你需要把每个人都放在同一个组中,并使用方差分量指定随机效应。
import pandas as pd
import statsmodels.api as sm
df = pd.read_csv('http://www.bodowinter.com/uploads/1/2/9/3/129362560/politeness_data.csv')
df = df.dropna()
df["group"] = 1
vcf = {"scenario": "0 + C(scenario)", "subject": "0 + C(subject)"}
model = sm.MixedLM.from_formula("frequency ~ attitude", groups="group",
vc_formula=vcf, re_formula="0", data=df)
result = model.fit()
结果如下:
Mixed Linear Model Regression Results
==============================================================
Model: MixedLM Dependent Variable: frequency
No. Observations: 83 Method: REML
No. Groups: 1 Scale: 646.0163
Min. group size: 83 Likelihood: -396.7268
Max. group size: 83 Converged: Yes
Mean group size: 83.0
--------------------------------------------------------------
Coef. Std.Err. z P>|z| [0.025 0.975]
--------------------------------------------------------------
Intercept 202.588 26.754 7.572 0.000 150.152 255.025
attitude[T.pol] -19.695 5.585 -3.526 0.000 -30.641 -8.748
scenario Var 218.991 6.476
subject Var 4014.616 104.614
==============================================================