摘要不适用于 OLS 估计
Summary not working for OLS estimation
我的 statsmodels OLS 估计有问题。模型 运行s 没有任何问题,但是当我尝试调用摘要以便我可以看到实际结果时,我得到了当 a 的形状和权重不同时需要指定的轴的类型错误。
我的代码如下所示:
from __future__ import print_function, division
import xlrd as xl
import numpy as np
import scipy as sp
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
file_loc = "/Users/NiklasLindeke/Python/dataset_3.xlsx"
workbook = xl.open_workbook(file_loc)
sheet = workbook.sheet_by_index(0)
tot = sheet.nrows
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
rv1 = []
rv5 = []
rv22 = []
rv1fcast = []
T = []
price = []
time = []
retnor = []
model = []
for i in range(1, tot):
t = data[i][0]
ret = data[i][1]
ret5 = data[i][2]
ret22 = data[i][3]
ret1_1 = data[i][4]
retn = data[i][5]
t = xl.xldate_as_tuple(t, 0)
rv1.append(ret)
rv5.append(ret5)
rv22.append(ret22)
rv1fcast.append(ret1_1)
retnor.append(retn)
T.append(t)
df = pd.DataFrame({'RVFCAST':rv1fcast, 'RV1':rv1, 'RV5':rv5, 'RV22':rv22,})
df = df[df.RVFCAST != ""]
Model = smf.ols(formula='RVFCAST ~ RV1 + RV5 + RV22', data = df).fit()
print Model.summary()
换句话说,这行不通。
回调如下:
print Model.summary()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-394-ea8ea5139fd4> in <module>()
----> 1 print Model.summary()
/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/regression/linear_model.pyc in summary(self, yname, xname, title, alpha)
1948 top_left.append(('Covariance Type:', [self.cov_type]))
1949
-> 1950 top_right = [('R-squared:', ["%#8.3f" % self.rsquared]),
1951 ('Adj. R-squared:', ["%#8.3f" % self.rsquared_adj]),
1952 ('F-statistic:', ["%#8.4g" % self.fvalue] ),
/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/tools/decorators.pyc in __get__(self, obj, type)
92 if _cachedval is None:
93 # Call the "fget" function
---> 94 _cachedval = self.fget(obj)
95 # Set the attribute in obj
96 # print("Setting %s in cache to %s" % (name, _cachedval))
/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/regression/linear_model.pyc in rsquared(self)
1179 def rsquared(self):
1180 if self.k_constant:
-> 1181 return 1 - self.ssr/self.centered_tss
1182 else:
1183 return 1 - self.ssr/self.uncentered_tss
/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/tools/decorators.pyc in __get__(self, obj, type)
92 if _cachedval is None:
93 # Call the "fget" function
---> 94 _cachedval = self.fget(obj)
95 # Set the attribute in obj
96 # print("Setting %s in cache to %s" % (name, _cachedval))
/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/regression/linear_model.pyc in centered_tss(self)
1159 if weights is not None:
1160 return np.sum(weights*(model.endog - np.average(model.endog,
-> 1161 weights=weights))**2)
1162 else: # this is probably broken for GLS
1163 centered_endog = model.wendog - model.wendog.mean()
/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/numpy/lib/function_base.pyc in average(a, axis, weights, returned)
522 if axis is None:
523 raise TypeError(
--> 524 "Axis must be specified when shapes of a and weights "
525 "differ.")
526 if wgt.ndim != 1:
TypeError: Axis must be specified when shapes of a and weights differ.
对不起,我不知道该怎么做。我还希望在这之后,用一些 Newey-West 方法对自相关进行校正,我看到你可以用下面的行来做:
mdl = Model.get_robustcov_results(cov_type='HAC',maxlags=1)
但是当我尝试 运行 使用我的模型时 returns 错误:
ValueError: operands could not be broadcast together with shapes (256,766) (256,1,256)
但我发现 statsmodels.formula 与 get_robustcov 函数不兼容,但如果是这样,我该如何测试自相关性呢?
但我最紧迫的问题是我无法为我的 OLS 生成摘要。
根据要求,这是我在 df 中的数据集的前三十行。
print df
RV1 RV22 RV5 RVFCAST
0 0.01553801 0.01309511 0.01081393 0.008421236
1 0.008881671 0.01301336 0.01134905 0.01553801
2 0.01042178 0.01326669 0.01189979 0.008881671
3 0.009809431 0.01334593 0.01170942 0.01042178
4 0.009418737 0.01358808 0.01152253 0.009809431
5 0.01821364 0.01362502 0.01269661 0.009418737
6 0.01163536 0.01331585 0.01147541 0.01821364
7 0.009469907 0.01329509 0.01172988 0.01163536
8 0.008875018 0.01361841 0.01202432 0.009469907
9 0.01528914 0.01430873 0.01233219 0.008875018
10 0.01210761 0.01412724 0.01238776 0.01528914
11 0.01290773 0.0144439 0.01432174 0.01210761
12 0.01094212 0.01425895 0.01493865 0.01290773
13 0.01041433 0.01430177 0.0156763 0.01094212
14 0.01556703 0.0142857 0.01986616 0.01041433
15 0.0217775 0.01430253 0.01864532 0.01556703
16 0.01599228 0.01390088 0.01579069 0.0217775
17 0.01463037 0.01384096 0.01416622 0.01599228
18 0.03136361 0.01395866 0.01398807 0.01463037
19 0.009462822 0.01295695 0.0106063 0.03136361
20 0.007504367 0.01295204 0.01114677 0.009462822
21 0.007869922 0.01300863 0.01267322 0.007504367
22 0.01373964 0.0129547 0.01314553 0.007869922
23 0.01445476 0.01271198 0.01268 0.01373964
24 0.01216517 0.01249902 0.01202476 0.01445476
25 0.0151366 0.01266783 0.0129083 0.01216517
26 0.01023149 0.01258627 0.0146934 0.0151366
27 0.01141199 0.01284094 0.01490637 0.01023149
28 0.01117856 0.01321258 0.01643881 0.01141199
29 0.01658287 0.01340074 0.01597086 0.01117856
非常感谢 user333800 的所有帮助!
以供将来遇到相同问题的人参考。
以下代码:
df = pd.DataFrame({'RVFCAST':rv1fcast, 'RV1':rv1, 'RV5':rv5, 'RV22':rv22,})
df = df[df.RVFCAST != ""]
df = df.astype(float)
Model = smf.ols(formula='RVFCAST ~ RV1 + RV5 + RV22', data = df).fit()
mdl = Model.get_robustcov_results(cov_type='HAC',maxlags=1)
给我:
print mdl.summary()
OLS Regression Results
==============================================================================
Dep. Variable: RVFCAST R-squared: 0.681
Model: OLS Adj. R-squared: 0.677
Method: Least Squares F-statistic: 120.9
Date: Wed, 22 Apr 2015 Prob (F-statistic): 1.60e-48
Time: 17:19:19 Log-Likelihood: 1159.8
No. Observations: 256 AIC: -2312.
Df Residuals: 252 BIC: -2297.
Df Model: 3
Covariance Type: HAC
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 0.0005 0.000 2.285 0.023 7.24e-05 0.001
RV1 0.2823 0.104 2.710 0.007 0.077 0.487
RV5 -0.0486 0.193 -0.252 0.802 -0.429 0.332
RV22 0.7450 0.232 3.212 0.001 0.288 1.202
==============================================================================
Omnibus: 174.186 Durbin-Watson: 2.045
Prob(Omnibus): 0.000 Jarque-Bera (JB): 2152.634
Skew: 2.546 Prob(JB): 0.00
Kurtosis: 16.262 Cond. No. 1.19e+03
==============================================================================
现在我可以继续我的论文了:)
我也遇到了同样的问题,发现是输入数据的问题。我通过将小数点“,”更改为“。”解决了这个问题。
我的 statsmodels OLS 估计有问题。模型 运行s 没有任何问题,但是当我尝试调用摘要以便我可以看到实际结果时,我得到了当 a 的形状和权重不同时需要指定的轴的类型错误。
我的代码如下所示:
from __future__ import print_function, division
import xlrd as xl
import numpy as np
import scipy as sp
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm
file_loc = "/Users/NiklasLindeke/Python/dataset_3.xlsx"
workbook = xl.open_workbook(file_loc)
sheet = workbook.sheet_by_index(0)
tot = sheet.nrows
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
rv1 = []
rv5 = []
rv22 = []
rv1fcast = []
T = []
price = []
time = []
retnor = []
model = []
for i in range(1, tot):
t = data[i][0]
ret = data[i][1]
ret5 = data[i][2]
ret22 = data[i][3]
ret1_1 = data[i][4]
retn = data[i][5]
t = xl.xldate_as_tuple(t, 0)
rv1.append(ret)
rv5.append(ret5)
rv22.append(ret22)
rv1fcast.append(ret1_1)
retnor.append(retn)
T.append(t)
df = pd.DataFrame({'RVFCAST':rv1fcast, 'RV1':rv1, 'RV5':rv5, 'RV22':rv22,})
df = df[df.RVFCAST != ""]
Model = smf.ols(formula='RVFCAST ~ RV1 + RV5 + RV22', data = df).fit()
print Model.summary()
换句话说,这行不通。
回调如下:
print Model.summary()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-394-ea8ea5139fd4> in <module>()
----> 1 print Model.summary()
/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/regression/linear_model.pyc in summary(self, yname, xname, title, alpha)
1948 top_left.append(('Covariance Type:', [self.cov_type]))
1949
-> 1950 top_right = [('R-squared:', ["%#8.3f" % self.rsquared]),
1951 ('Adj. R-squared:', ["%#8.3f" % self.rsquared_adj]),
1952 ('F-statistic:', ["%#8.4g" % self.fvalue] ),
/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/tools/decorators.pyc in __get__(self, obj, type)
92 if _cachedval is None:
93 # Call the "fget" function
---> 94 _cachedval = self.fget(obj)
95 # Set the attribute in obj
96 # print("Setting %s in cache to %s" % (name, _cachedval))
/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/regression/linear_model.pyc in rsquared(self)
1179 def rsquared(self):
1180 if self.k_constant:
-> 1181 return 1 - self.ssr/self.centered_tss
1182 else:
1183 return 1 - self.ssr/self.uncentered_tss
/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/tools/decorators.pyc in __get__(self, obj, type)
92 if _cachedval is None:
93 # Call the "fget" function
---> 94 _cachedval = self.fget(obj)
95 # Set the attribute in obj
96 # print("Setting %s in cache to %s" % (name, _cachedval))
/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/statsmodels-0.6.1-py2.7-macosx-10.6-x86_64.egg/statsmodels/regression/linear_model.pyc in centered_tss(self)
1159 if weights is not None:
1160 return np.sum(weights*(model.endog - np.average(model.endog,
-> 1161 weights=weights))**2)
1162 else: # this is probably broken for GLS
1163 centered_endog = model.wendog - model.wendog.mean()
/Users/NiklasLindeke/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/numpy/lib/function_base.pyc in average(a, axis, weights, returned)
522 if axis is None:
523 raise TypeError(
--> 524 "Axis must be specified when shapes of a and weights "
525 "differ.")
526 if wgt.ndim != 1:
TypeError: Axis must be specified when shapes of a and weights differ.
对不起,我不知道该怎么做。我还希望在这之后,用一些 Newey-West 方法对自相关进行校正,我看到你可以用下面的行来做:
mdl = Model.get_robustcov_results(cov_type='HAC',maxlags=1)
但是当我尝试 运行 使用我的模型时 returns 错误:
ValueError: operands could not be broadcast together with shapes (256,766) (256,1,256)
但我发现 statsmodels.formula 与 get_robustcov 函数不兼容,但如果是这样,我该如何测试自相关性呢?
但我最紧迫的问题是我无法为我的 OLS 生成摘要。
根据要求,这是我在 df 中的数据集的前三十行。
print df
RV1 RV22 RV5 RVFCAST
0 0.01553801 0.01309511 0.01081393 0.008421236
1 0.008881671 0.01301336 0.01134905 0.01553801
2 0.01042178 0.01326669 0.01189979 0.008881671
3 0.009809431 0.01334593 0.01170942 0.01042178
4 0.009418737 0.01358808 0.01152253 0.009809431
5 0.01821364 0.01362502 0.01269661 0.009418737
6 0.01163536 0.01331585 0.01147541 0.01821364
7 0.009469907 0.01329509 0.01172988 0.01163536
8 0.008875018 0.01361841 0.01202432 0.009469907
9 0.01528914 0.01430873 0.01233219 0.008875018
10 0.01210761 0.01412724 0.01238776 0.01528914
11 0.01290773 0.0144439 0.01432174 0.01210761
12 0.01094212 0.01425895 0.01493865 0.01290773
13 0.01041433 0.01430177 0.0156763 0.01094212
14 0.01556703 0.0142857 0.01986616 0.01041433
15 0.0217775 0.01430253 0.01864532 0.01556703
16 0.01599228 0.01390088 0.01579069 0.0217775
17 0.01463037 0.01384096 0.01416622 0.01599228
18 0.03136361 0.01395866 0.01398807 0.01463037
19 0.009462822 0.01295695 0.0106063 0.03136361
20 0.007504367 0.01295204 0.01114677 0.009462822
21 0.007869922 0.01300863 0.01267322 0.007504367
22 0.01373964 0.0129547 0.01314553 0.007869922
23 0.01445476 0.01271198 0.01268 0.01373964
24 0.01216517 0.01249902 0.01202476 0.01445476
25 0.0151366 0.01266783 0.0129083 0.01216517
26 0.01023149 0.01258627 0.0146934 0.0151366
27 0.01141199 0.01284094 0.01490637 0.01023149
28 0.01117856 0.01321258 0.01643881 0.01141199
29 0.01658287 0.01340074 0.01597086 0.01117856
非常感谢 user333800 的所有帮助!
以供将来遇到相同问题的人参考。
以下代码:
df = pd.DataFrame({'RVFCAST':rv1fcast, 'RV1':rv1, 'RV5':rv5, 'RV22':rv22,})
df = df[df.RVFCAST != ""]
df = df.astype(float)
Model = smf.ols(formula='RVFCAST ~ RV1 + RV5 + RV22', data = df).fit()
mdl = Model.get_robustcov_results(cov_type='HAC',maxlags=1)
给我:
print mdl.summary()
OLS Regression Results
==============================================================================
Dep. Variable: RVFCAST R-squared: 0.681
Model: OLS Adj. R-squared: 0.677
Method: Least Squares F-statistic: 120.9
Date: Wed, 22 Apr 2015 Prob (F-statistic): 1.60e-48
Time: 17:19:19 Log-Likelihood: 1159.8
No. Observations: 256 AIC: -2312.
Df Residuals: 252 BIC: -2297.
Df Model: 3
Covariance Type: HAC
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 0.0005 0.000 2.285 0.023 7.24e-05 0.001
RV1 0.2823 0.104 2.710 0.007 0.077 0.487
RV5 -0.0486 0.193 -0.252 0.802 -0.429 0.332
RV22 0.7450 0.232 3.212 0.001 0.288 1.202
==============================================================================
Omnibus: 174.186 Durbin-Watson: 2.045
Prob(Omnibus): 0.000 Jarque-Bera (JB): 2152.634
Skew: 2.546 Prob(JB): 0.00
Kurtosis: 16.262 Cond. No. 1.19e+03
==============================================================================
现在我可以继续我的论文了:)
我也遇到了同样的问题,发现是输入数据的问题。我通过将小数点“,”更改为“。”解决了这个问题。