Python statsmodel 输出和 Excel/Google Sheet 输出不匹配
Python statsmodel outpput and Excel/Google Sheet output doesn't match
我有一个小数据集,由于某种原因,输出与 Excel 的不匹配。
这是我所做的。我必须列:
Miles Traveled
Travel Time
89
7.0
66
5.4
78
6.6
111
7.4
44
4.8
77
6.4
80
7.0
66
5.6
109
7.3
76
6.4
这是我在 Google Sheet:
上得到的输出
Slope
Intercept
Coefficient
0.04025678079
3.185560249
Standard Error
0.005706415564
0.4669507938
R Squared, Standard Error
0.8615153295
0.3423088398
F Stat
49.76812677
8
Regression SS / Residual SS
5.831597265
0.9374027345
此输出也与 excel 输出匹配。
但是,当我在 statsmodel 上执行以下操作时:
milesTravelled = [89.0, 66.0, 78.0, 111.0, 44.0, 77.0, 80.0, 66.0, 109.0, 76.0]
travelTime = [7.0, 5.4, 6.6, 7.4, 4.8, 6.4, 7.0, 5.6, 7.3, 6.4]
model = sm.OLS(travelTime, milesTraveled).fit()
print(model.summary())
我得到以下信息:
OLS Regression Results
=======================================================================================
Dep. Variable: Travel Time R-squared (uncentered): 0.985
Model: OLS Adj. R-squared (uncentered): 0.983
Method: Least Squares F-statistic: 575.6
Date: Mon, 01 Feb 2021 Prob (F-statistic): 1.82e-09
Time: 10:18:44 Log-Likelihood: -11.951
No. Observations: 10 AIC: 25.90
Df Residuals: 9 BIC: 26.20
Df Model: 1
Covariance Type: nonrobust
==================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------
Miles Traveled 0.0781 0.003 23.991 0.000 0.071 0.085
==============================================================================
Omnibus: 2.179 Durbin-Watson: 2.654
Prob(Omnibus): 0.336 Jarque-Bera (JB): 1.033
Skew: -0.777 Prob(JB): 0.597
Kurtosis: 2.741 Cond. No. 1.00
==============================================================================
如您所见,标准误差、R 平方等的值与 Google Sheet/Excel 完全不匹配。我究竟做错了什么?我该怎么做才能获得准确的结果摘要,例如 Google Sheet/Excel?
默认情况下,OLS
class 不包括线性模型中的常数项。您可以使用 sm.add_constant
为 OLS
:
创建适当的 exog
参数
In [36]: milesTraveled = [89.0, 66.0, 78.0, 111.0, 44.0, 77.0, 80.0, 66.0, 109.0, 76.0]
In [37]: travelTime = [7.0, 5.4, 6.6, 7.4, 4.8, 6.4, 7.0, 5.6, 7.3, 6.4]
In [38]: X = sm.add_constant(milesTraveled)
In [39]: model = sm.OLS(travelTime, X).fit()
In [40]: print(model.summary())
/Users/warren/a2020.11/lib/python3.8/site-packages/scipy/stats/stats.py:1603: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10
warnings.warn("kurtosistest only valid for n>=20 ... continuing "
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.862
Model: OLS Adj. R-squared: 0.844
Method: Least Squares F-statistic: 49.77
Date: Mon, 01 Feb 2021 Prob (F-statistic): 0.000107
Time: 13:04:53 Log-Likelihood: -2.3532
No. Observations: 10 AIC: 8.706
Df Residuals: 8 BIC: 9.312
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 3.1856 0.467 6.822 0.000 2.109 4.262
x1 0.0403 0.006 7.055 0.000 0.027 0.053
==============================================================================
Omnibus: 0.542 Durbin-Watson: 2.608
Prob(Omnibus): 0.763 Jarque-Bera (JB): 0.554
Skew: 0.370 Prob(JB): 0.758
Kurtosis: 2.115 Cond. No. 353.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
我有一个小数据集,由于某种原因,输出与 Excel 的不匹配。
这是我所做的。我必须列:
Miles Traveled | Travel Time |
---|---|
89 | 7.0 |
66 | 5.4 |
78 | 6.6 |
111 | 7.4 |
44 | 4.8 |
77 | 6.4 |
80 | 7.0 |
66 | 5.6 |
109 | 7.3 |
76 | 6.4 |
这是我在 Google Sheet:
上得到的输出Slope | Intercept | |
---|---|---|
Coefficient | 0.04025678079 | 3.185560249 |
Standard Error | 0.005706415564 | 0.4669507938 |
R Squared, Standard Error | 0.8615153295 | 0.3423088398 |
F Stat | 49.76812677 | 8 |
Regression SS / Residual SS | 5.831597265 | 0.9374027345 |
此输出也与 excel 输出匹配。
但是,当我在 statsmodel 上执行以下操作时:
milesTravelled = [89.0, 66.0, 78.0, 111.0, 44.0, 77.0, 80.0, 66.0, 109.0, 76.0]
travelTime = [7.0, 5.4, 6.6, 7.4, 4.8, 6.4, 7.0, 5.6, 7.3, 6.4]
model = sm.OLS(travelTime, milesTraveled).fit()
print(model.summary())
我得到以下信息:
OLS Regression Results
=======================================================================================
Dep. Variable: Travel Time R-squared (uncentered): 0.985
Model: OLS Adj. R-squared (uncentered): 0.983
Method: Least Squares F-statistic: 575.6
Date: Mon, 01 Feb 2021 Prob (F-statistic): 1.82e-09
Time: 10:18:44 Log-Likelihood: -11.951
No. Observations: 10 AIC: 25.90
Df Residuals: 9 BIC: 26.20
Df Model: 1
Covariance Type: nonrobust
==================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------
Miles Traveled 0.0781 0.003 23.991 0.000 0.071 0.085
==============================================================================
Omnibus: 2.179 Durbin-Watson: 2.654
Prob(Omnibus): 0.336 Jarque-Bera (JB): 1.033
Skew: -0.777 Prob(JB): 0.597
Kurtosis: 2.741 Cond. No. 1.00
==============================================================================
如您所见,标准误差、R 平方等的值与 Google Sheet/Excel 完全不匹配。我究竟做错了什么?我该怎么做才能获得准确的结果摘要,例如 Google Sheet/Excel?
默认情况下,OLS
class 不包括线性模型中的常数项。您可以使用 sm.add_constant
为 OLS
:
exog
参数
In [36]: milesTraveled = [89.0, 66.0, 78.0, 111.0, 44.0, 77.0, 80.0, 66.0, 109.0, 76.0]
In [37]: travelTime = [7.0, 5.4, 6.6, 7.4, 4.8, 6.4, 7.0, 5.6, 7.3, 6.4]
In [38]: X = sm.add_constant(milesTraveled)
In [39]: model = sm.OLS(travelTime, X).fit()
In [40]: print(model.summary())
/Users/warren/a2020.11/lib/python3.8/site-packages/scipy/stats/stats.py:1603: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10
warnings.warn("kurtosistest only valid for n>=20 ... continuing "
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.862
Model: OLS Adj. R-squared: 0.844
Method: Least Squares F-statistic: 49.77
Date: Mon, 01 Feb 2021 Prob (F-statistic): 0.000107
Time: 13:04:53 Log-Likelihood: -2.3532
No. Observations: 10 AIC: 8.706
Df Residuals: 8 BIC: 9.312
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 3.1856 0.467 6.822 0.000 2.109 4.262
x1 0.0403 0.006 7.055 0.000 0.027 0.053
==============================================================================
Omnibus: 0.542 Durbin-Watson: 2.608
Prob(Omnibus): 0.763 Jarque-Bera (JB): 0.554
Skew: 0.370 Prob(JB): 0.758
Kurtosis: 2.115 Cond. No. 353.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.