具有两个随机效应的混合模型 - statsmodels

mixed-models with two random effects - statsmodels

import pandas as pd
import statsmodels.formula.api as smf

df = pd.read_csv('http://www.bodowinter.com/tutorial/politeness_data.csv')
df = df.drop(38)

R我会做:

lmer(frequency ~ attitude + (1|subject) + (1|scenario), data=df)

R 中给我:

Random effects:
 Groups   Name        Variance Std.Dev.
 scenario (Intercept)  219     14.80   
 subject  (Intercept) 4015     63.36   
 Residual              646     25.42   
Fixed effects:
            Estimate Std. Error t value
(Intercept)  202.588     26.754   7.572
attitudepol  -19.695      5.585  -3.527

我尝试对 statsmodels 做同样的事情:

model = smf.mixedlm("frequency ~ attitude", data=df, groups=df[["subject","scenario"]]).fit()

但是 model.summary() 给了我不同的输出:

      Mixed Linear Model Regression Results
=======================================================
Model:            MixedLM Dependent Variable: frequency
No. Observations: 83      Method:             REML     
No. Groups:       2       Scale:              0.0000   
Min. group size:  1       Likelihood:         inf      
Max. group size:  1       Converged:          Yes      
Mean group size:  1.0                                  
-------------------------------------------------------
                  Coef.  Std.Err. z P>|z| [0.025 0.975]
-------------------------------------------------------
Intercept        204.500                               
attitude[T.pol]    8.800                               
groups RE          0.000                               
=======================================================

您的 smf.mixedlm 模型的 lmer 等价物如下所示:

lmer(frequency ~ attitude + (1 + attitude|subject) + (1 + attitude|scenario), data = df)

术语解释:

  1. 全局拦截(您可以使用frequency ~ 0 + attitude + ...禁用全局拦截)
  2. 固定效应的全局斜率 attitude
  3. A random intercept vor subject(即对于 subject 的每个水平,您都会得到与全局截距的偏差),以及与 attitude 内的固定效应斜率的偏差subject 的每个级别,允许随机截距和斜率之间的相关性。
  4. scenario 的等效随机截距和斜率项。

请注意,如果您想允许随机截距和斜率自由变化(即强制截距和斜率之间的相关性为零),您必须将 (1 + attitude|subject) 替换为 (1|subject) + (0 + attitude|subject),类似地scenario.

我能想到的半复制的唯一方法是简单地连接你的组。

df["grp"] = df["subject"].astype(str) + df["scenario"].astype(str)
model = smf.mixedlm("frequency ~ attitude", data=df, groups=df["grp"]).fit()

model.summary()
Out[87]: 
<class 'statsmodels.iolib.summary2.Summary'>
"""
            Mixed Linear Model Regression Results
==============================================================
Model:               MixedLM   Dependent Variable:   frequency
No. Observations:    83        Method:               REML     
No. Groups:          42        Scale:                615.6961 
Min. group size:     1         Likelihood:           -430.8261
Max. group size:     2         Converged:            Yes      
Mean group size:     2.0                                      
--------------------------------------------------------------
                 Coef.   Std.Err.   z    P>|z|  [0.025  0.975]
--------------------------------------------------------------
Intercept        202.588   10.078 20.102 0.000 182.836 222.340
attitude[T.pol]  -19.618    5.476 -3.582 0.000 -30.350  -8.885
groups RE       3650.021   50.224                             
==============================================================

"""

下面的代码重现了 R 结果。由于这是一个没有独立组的交叉模型,你需要把每个人都放在同一个组中,并使用方差分量指定随机效应。

import pandas as pd                                                                                                        
import statsmodels.api as sm                                                                                               
                                                                                                                           
df = pd.read_csv('http://www.bodowinter.com/uploads/1/2/9/3/129362560/politeness_data.csv')                                                 
df = df.dropna()                                                                                                           
df["group"] = 1                                                                                                            
                                                                                                                           
vcf = {"scenario": "0 + C(scenario)", "subject": "0 + C(subject)"}                                                         
model = sm.MixedLM.from_formula("frequency ~ attitude", groups="group",                                                    
                                vc_formula=vcf, re_formula="0", data=df)                                                   
result = model.fit()  

结果如下:

            Mixed Linear Model Regression Results
==============================================================
Model:               MixedLM   Dependent Variable:   frequency
No. Observations:    83        Method:               REML     
No. Groups:          1         Scale:                646.0163 
Min. group size:     83        Likelihood:           -396.7268
Max. group size:     83        Converged:            Yes      
Mean group size:     83.0                                     
--------------------------------------------------------------
                 Coef.   Std.Err.   z    P>|z|  [0.025  0.975]
--------------------------------------------------------------
Intercept        202.588   26.754  7.572 0.000 150.152 255.025
attitude[T.pol]  -19.695    5.585 -3.526 0.000 -30.641  -8.748
scenario Var     218.991    6.476                             
subject Var     4014.616  104.614                             
==============================================================