Statsmodels.formula.api OLS 不显示截距的统计值
Statsmodels.formula.api OLS does not show statistical values of intercept
我是运行下面的源码:
import statsmodels.formula.api as sm
# Add one column of ones for the intercept term
X = np.append(arr= np.ones((50, 1)).astype(int), values=X, axis=1)
regressor_OLS = sm.OLS(endog=y, exog=X).fit()
print(regressor_OLS.summary())
哪里
X
是一个 50x5(在添加拦截项之前)numpy 数组,如下所示:
[[0 1 165349.20 136897.80 471784.10]
[0 0 162597.70 151377.59 443898.53]...]
和 y
是一个 50x1 numpy 数组,其因变量具有浮点值。
前两列是具有三个不同值的虚拟变量。其余列是三个不同的独立变量。
尽管据说 statsmodels.formula.api.OLS
会自动添加截距项(请参阅此处@stellacia 的回答:),但其 summary
不会显示截距的统计值在我的案例中,下面的术语很明显:
OLS Regression Results
==============================================================================
Dep. Variable: Profit R-squared: 0.988
Model: OLS Adj. R-squared: 0.986
Method: Least Squares F-statistic: 727.1
Date: Sun, 01 Jul 2018 Prob (F-statistic): 7.87e-42
Time: 21:40:23 Log-Likelihood: -545.15
No. Observations: 50 AIC: 1100.
Df Residuals: 45 BIC: 1110.
Df Model: 5
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
x1 3464.4536 4905.406 0.706 0.484 -6415.541 1.33e+04
x2 5067.8937 4668.238 1.086 0.283 -4334.419 1.45e+04
x3 0.7182 0.066 10.916 0.000 0.586 0.851
x4 0.3113 0.035 8.885 0.000 0.241 0.382
x5 0.0786 0.023 3.429 0.001 0.032 0.125
==============================================================================
Omnibus: 1.355 Durbin-Watson: 1.288
Prob(Omnibus): 0.508 Jarque-Bera (JB): 1.241
Skew: -0.237 Prob(JB): 0.538
Kurtosis: 2.391 Cond. No. 8.28e+05
==============================================================================
出于这个原因,我在源代码中添加了以下行:
X = np.append(arr= np.ones((50, 1)).astype(int), values=X, axis=1)
正如你在我的post开头看到的,intercept/constant的统计值如下所示:
OLS Regression Results
==============================================================================
Dep. Variable: Profit R-squared: 0.951
Model: OLS Adj. R-squared: 0.945
Method: Least Squares F-statistic: 169.9
Date: Sun, 01 Jul 2018 Prob (F-statistic): 1.34e-27
Time: 20:25:21 Log-Likelihood: -525.38
No. Observations: 50 AIC: 1063.
Df Residuals: 44 BIC: 1074.
Df Model: 5
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 5.013e+04 6884.820 7.281 0.000 3.62e+04 6.4e+04
x1 198.7888 3371.007 0.059 0.953 -6595.030 6992.607
x2 -41.8870 3256.039 -0.013 0.990 -6604.003 6520.229
x3 0.8060 0.046 17.369 0.000 0.712 0.900
x4 -0.0270 0.052 -0.517 0.608 -0.132 0.078
x5 0.0270 0.017 1.574 0.123 -0.008 0.062
==============================================================================
Omnibus: 14.782 Durbin-Watson: 1.283
Prob(Omnibus): 0.001 Jarque-Bera (JB): 21.266
Skew: -0.948 Prob(JB): 2.41e-05
Kurtosis: 5.572 Cond. No. 1.45e+06
==============================================================================
虽然据说statsmodels.formula.api.OLS
是自动添加的,但我自己不添加截距项,为什么截距的统计值没有显示?
"No constant is added by the model unless you are using formulas."
因此尝试像下面的例子。变量名称应根据您的数据集定义。
使用,
regressor_OLS = smf.ols(formula='Y_variable ~ X_variable', data=df).fit()
而不是
regressor_OLS = sm.OLS(endog=y, exog=X).fit()
可以用这个
X = sm.add_constant(X)
我是运行下面的源码:
import statsmodels.formula.api as sm
# Add one column of ones for the intercept term
X = np.append(arr= np.ones((50, 1)).astype(int), values=X, axis=1)
regressor_OLS = sm.OLS(endog=y, exog=X).fit()
print(regressor_OLS.summary())
哪里
X
是一个 50x5(在添加拦截项之前)numpy 数组,如下所示:
[[0 1 165349.20 136897.80 471784.10]
[0 0 162597.70 151377.59 443898.53]...]
和 y
是一个 50x1 numpy 数组,其因变量具有浮点值。
前两列是具有三个不同值的虚拟变量。其余列是三个不同的独立变量。
尽管据说 statsmodels.formula.api.OLS
会自动添加截距项(请参阅此处@stellacia 的回答:summary
不会显示截距的统计值在我的案例中,下面的术语很明显:
OLS Regression Results
==============================================================================
Dep. Variable: Profit R-squared: 0.988
Model: OLS Adj. R-squared: 0.986
Method: Least Squares F-statistic: 727.1
Date: Sun, 01 Jul 2018 Prob (F-statistic): 7.87e-42
Time: 21:40:23 Log-Likelihood: -545.15
No. Observations: 50 AIC: 1100.
Df Residuals: 45 BIC: 1110.
Df Model: 5
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
x1 3464.4536 4905.406 0.706 0.484 -6415.541 1.33e+04
x2 5067.8937 4668.238 1.086 0.283 -4334.419 1.45e+04
x3 0.7182 0.066 10.916 0.000 0.586 0.851
x4 0.3113 0.035 8.885 0.000 0.241 0.382
x5 0.0786 0.023 3.429 0.001 0.032 0.125
==============================================================================
Omnibus: 1.355 Durbin-Watson: 1.288
Prob(Omnibus): 0.508 Jarque-Bera (JB): 1.241
Skew: -0.237 Prob(JB): 0.538
Kurtosis: 2.391 Cond. No. 8.28e+05
==============================================================================
出于这个原因,我在源代码中添加了以下行:
X = np.append(arr= np.ones((50, 1)).astype(int), values=X, axis=1)
正如你在我的post开头看到的,intercept/constant的统计值如下所示:
OLS Regression Results
==============================================================================
Dep. Variable: Profit R-squared: 0.951
Model: OLS Adj. R-squared: 0.945
Method: Least Squares F-statistic: 169.9
Date: Sun, 01 Jul 2018 Prob (F-statistic): 1.34e-27
Time: 20:25:21 Log-Likelihood: -525.38
No. Observations: 50 AIC: 1063.
Df Residuals: 44 BIC: 1074.
Df Model: 5
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 5.013e+04 6884.820 7.281 0.000 3.62e+04 6.4e+04
x1 198.7888 3371.007 0.059 0.953 -6595.030 6992.607
x2 -41.8870 3256.039 -0.013 0.990 -6604.003 6520.229
x3 0.8060 0.046 17.369 0.000 0.712 0.900
x4 -0.0270 0.052 -0.517 0.608 -0.132 0.078
x5 0.0270 0.017 1.574 0.123 -0.008 0.062
==============================================================================
Omnibus: 14.782 Durbin-Watson: 1.283
Prob(Omnibus): 0.001 Jarque-Bera (JB): 21.266
Skew: -0.948 Prob(JB): 2.41e-05
Kurtosis: 5.572 Cond. No. 1.45e+06
==============================================================================
虽然据说statsmodels.formula.api.OLS
是自动添加的,但我自己不添加截距项,为什么截距的统计值没有显示?
"No constant is added by the model unless you are using formulas." 因此尝试像下面的例子。变量名称应根据您的数据集定义。
使用,
regressor_OLS = smf.ols(formula='Y_variable ~ X_variable', data=df).fit()
而不是
regressor_OLS = sm.OLS(endog=y, exog=X).fit()
可以用这个 X = sm.add_constant(X)