如何估计避免多重共线性的 lm 虚拟回归?

How to estimate a lm dummy regression avoiding multicollinearity?

我在虚拟变量上使用 lm 进行回归时遇到问题。我想弄清楚季节性影响(季节性)随着时间的推移而变化。为此,我建立了以下回归:

AT.trendinseason.lm <- lm(DTR.detrended~0+dum.jan+dum.feb+dum.mar+dum.apr+dum.may+dum.jun+dum.jul+dum.aug+dum.sep+dum.oct+dum.nov+dum.dec+dum.jan*t+dum.feb*t+dum.mar*t+dum.apr*t+dum.may*t+dum.jun*t+dum.jul*t+dum.aug*t+dum.sep*t+dum.oct*t+dum.nov*t+dum.dec*t)

我得到的输出如下:

summary(AT.trendinseason.lm)

Call:
lm(formula = DTR.detrended ~ 0 + dum.jan + dum.feb + dum.mar + 
    dum.apr + dum.may + dum.jun + dum.jul + dum.aug + dum.sep + 
    dum.oct + dum.nov + dum.dec + dum.jan * t + dum.feb * t + 
    dum.mar * t + dum.apr * t + dum.may * t + dum.jun * t + dum.jul * 
    t + dum.aug * t + dum.sep * t + dum.oct * t + dum.nov * t + 
    dum.dec * t)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.4047 -2.2737 -0.3229  2.0987 18.9906 

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
dum.jan   -2.495e+00  1.121e-01 -22.262  < 2e-16 ***
dum.feb   -1.527e+00  1.176e-01 -12.983  < 2e-16 ***
dum.mar    2.493e-01  1.124e-01   2.218 0.026552 *  
dum.apr    1.266e+00  1.144e-01  11.073  < 2e-16 ***
dum.may    1.785e+00  1.127e-01  15.844  < 2e-16 ***
dum.jun    1.597e+00  1.147e-01  13.926  < 2e-16 ***
dum.jul    1.882e+00  1.131e-01  16.640  < 2e-16 ***
dum.aug    1.544e+00  1.126e-01  13.721  < 2e-16 ***
dum.sep    1.335e+00  1.134e-01  11.780  < 2e-16 ***
dum.oct    8.306e-02  1.117e-01   0.744 0.456961    
dum.nov   -2.545e+00  1.137e-01 -22.390  < 2e-16 ***
dum.dec   -3.101e+00  1.119e-01 -27.703  < 2e-16 ***
t         -1.343e-05  5.431e-06  -2.473 0.013389 *  
dum.jan:t -8.571e-06  7.681e-06  -1.116 0.264444    
dum.feb:t -3.094e-06  7.866e-06  -0.393 0.694090    
dum.mar:t  5.346e-06  7.681e-06   0.696 0.486406    
dum.apr:t  3.850e-05  7.744e-06   4.971 6.69e-07 ***
dum.may:t  2.748e-05  7.681e-06   3.578 0.000346 ***
dum.jun:t  2.959e-05  7.744e-06   3.821 0.000133 ***
dum.jul:t  3.384e-05  7.698e-06   4.396 1.10e-05 ***
dum.aug:t  4.494e-05  7.711e-06   5.828 5.67e-09 ***
dum.sep:t -1.921e-06  7.744e-06  -0.248 0.804105    
dum.oct:t -1.526e-05  7.681e-06  -1.987 0.046943 *  
dum.nov:t  8.864e-07  7.744e-06   0.114 0.908876    
dum.dec:t         NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.093 on 35745 degrees of freedom
Multiple R-squared:  0.3145,    Adjusted R-squared:  0.314 
F-statistic: 683.2 on 24 and 35745 DF,  p-value: < 2.2e-16

不过情况是我知道应该不会有多重共线性的问题。 R 仍然省略了我的变量。有什么办法可以阻止 R 这样做吗?

我想遵循的模型来自我读过的一篇论文,它似乎可行:

这是我想采用的方法,但似乎行不通。

请帮忙。

我解决了这个问题,这完全取决于我如何编写交互项。似乎 R 在 * 符号方面遇到了一些麻烦。我用 : 替换了 *,结果成功了。我不知道为什么,但感谢上帝,我找到了解决方案。新代码是:

AT.trendinseason.lm <- lm(DTR.detrended~0+dum.jan+dum.feb+dum.mar+dum.apr+dum.may+dum.jun+dum.jul+dum.aug+dum.sep+dum.oct+dum.nov+dum.dec+dum.jan:t+dum.feb:t+dum.mar:t+dum.apr:t+dum.may:t+dum.jun:t+dum.jul:t+dum.aug:t+dum.sep:t+dum.oct:t+dum.nov:t+dum.dec:t)

给我想要的结果:

Call:
lm(formula = DTR.detrended ~ 0 + dum.jan + dum.feb + dum.mar + 
    dum.apr + dum.may + dum.jun + dum.jul + dum.aug + dum.sep + 
    dum.oct + dum.nov + dum.dec + dum.jan:t + dum.feb:t + dum.mar:t + 
    dum.apr:t + dum.may:t + dum.jun:t + dum.jul:t + dum.aug:t + 
    dum.sep:t + dum.oct:t + dum.nov:t + dum.dec:t)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.4047 -2.2737 -0.3229  2.0987 18.9906 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
dum.jan   -2.495e+00  1.121e-01 -22.262  < 2e-16 ***
dum.feb   -1.527e+00  1.176e-01 -12.983  < 2e-16 ***
dum.mar    2.493e-01  1.124e-01   2.218 0.026552 *  
dum.apr    1.266e+00  1.144e-01  11.073  < 2e-16 ***
dum.may    1.785e+00  1.127e-01  15.844  < 2e-16 ***
dum.jun    1.597e+00  1.147e-01  13.926  < 2e-16 ***
dum.jul    1.882e+00  1.131e-01  16.640  < 2e-16 ***
dum.aug    1.544e+00  1.126e-01  13.721  < 2e-16 ***
dum.sep    1.335e+00  1.134e-01  11.780  < 2e-16 ***
dum.oct    8.306e-02  1.117e-01   0.744 0.456961    
dum.nov   -2.545e+00  1.137e-01 -22.390  < 2e-16 ***
dum.dec   -3.101e+00  1.119e-01 -27.703  < 2e-16 ***
dum.jan:t -2.200e-05  5.431e-06  -4.052 5.10e-05 ***
dum.feb:t -1.653e-05  5.691e-06  -2.904 0.003685 ** 
dum.mar:t -8.087e-06  5.431e-06  -1.489 0.136489    
dum.apr:t  2.507e-05  5.521e-06   4.540 5.64e-06 ***
dum.may:t  1.405e-05  5.431e-06   2.587 0.009688 ** 
dum.jun:t  1.616e-05  5.521e-06   2.927 0.003422 ** 
dum.jul:t  2.041e-05  5.455e-06   3.741 0.000184 ***
dum.aug:t  3.150e-05  5.474e-06   5.755 8.73e-09 ***
dum.sep:t -1.535e-05  5.521e-06  -2.781 0.005420 ** 
dum.oct:t -2.869e-05  5.431e-06  -5.283 1.28e-07 ***
dum.nov:t -1.255e-05  5.521e-06  -2.273 0.023056 *  
dum.dec:t -1.343e-05  5.431e-06  -2.473 0.013389 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.093 on 35745 degrees of freedom
Multiple R-squared:  0.3145,    Adjusted R-squared:  0.314 
F-statistic: 683.2 on 24 and 35745 DF,  p-value: < 2.2e-16

无论如何,您现在知道解决此问题的一种方法。我希望它能帮助别人。