如何使用 scikit-learn 从多项式回归输出回归分析摘要?
How to output Regression Analysis summary from polynomial regression with scikit-learn?
我目前有以下代码,它对具有 4 个变量的数据集进行多项式回归:
def polyreg():
dataset = genfromtxt(open('train.csv','r'), delimiter=',', dtype='f8')[1:]
target = [x[0] for x in dataset]
train = [x[1:] for x in dataset]
test = genfromtxt(open('test.csv','r'), delimiter=',', dtype='f8')[1:]
poly = PolynomialFeatures(degree=2)
train_poly = poly.fit_transform(train)
test_poly = poly.fit_transform(test)
clf = linear_model.LinearRegression()
clf.fit(train_poly, target)
savetxt('polyreg_test1.csv', clf.predict(test_poly), delimiter=',', fmt='%f')
我想知道是否有办法像 Excel 那样输出回归摘要?我探索了 linear_model.LinearRegression() 的 attributes/methods 但找不到任何东西。
这在 scikit-learn 中没有实现; scikit-learn 生态系统非常偏向于使用交叉验证进行模型评估(我认为这是一件好事;大多数测试统计数据都是在计算机功能强大到足以使交叉验证可行之前开发出来的)。
对于更传统的统计分析类型,您可以使用 statsmodels
,这里是一个示例 taken from their documentation:
import numpy as np
import statsmodels.api as sm
nsample = 100
x = np.linspace(0, 10, 100)
X = np.column_stack((x, x**2))
beta = np.array([1, 0.1, 10])
e = np.random.normal(size=nsample)
X = sm.add_constant(X)
y = np.dot(X, beta) + e
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 1.000
Model: OLS Adj. R-squared: 1.000
Method: Least Squares F-statistic: 4.020e+06
Date: Sun, 01 Feb 2015 Prob (F-statistic): 2.83e-239
Time: 09:32:32 Log-Likelihood: -146.51
No. Observations: 100 AIC: 299.0
Df Residuals: 97 BIC: 306.8
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
const 1.3423 0.313 4.292 0.000 0.722 1.963
x1 -0.0402 0.145 -0.278 0.781 -0.327 0.247
x2 10.0103 0.014 715.745 0.000 9.982 10.038
==============================================================================
Omnibus: 2.042 Durbin-Watson: 2.274
Prob(Omnibus): 0.360 Jarque-Bera (JB): 1.875
Skew: 0.234 Prob(JB): 0.392
Kurtosis: 2.519 Cond. No. 144.
==============================================================================
我目前有以下代码,它对具有 4 个变量的数据集进行多项式回归:
def polyreg():
dataset = genfromtxt(open('train.csv','r'), delimiter=',', dtype='f8')[1:]
target = [x[0] for x in dataset]
train = [x[1:] for x in dataset]
test = genfromtxt(open('test.csv','r'), delimiter=',', dtype='f8')[1:]
poly = PolynomialFeatures(degree=2)
train_poly = poly.fit_transform(train)
test_poly = poly.fit_transform(test)
clf = linear_model.LinearRegression()
clf.fit(train_poly, target)
savetxt('polyreg_test1.csv', clf.predict(test_poly), delimiter=',', fmt='%f')
我想知道是否有办法像 Excel 那样输出回归摘要?我探索了 linear_model.LinearRegression() 的 attributes/methods 但找不到任何东西。
这在 scikit-learn 中没有实现; scikit-learn 生态系统非常偏向于使用交叉验证进行模型评估(我认为这是一件好事;大多数测试统计数据都是在计算机功能强大到足以使交叉验证可行之前开发出来的)。
对于更传统的统计分析类型,您可以使用 statsmodels
,这里是一个示例 taken from their documentation:
import numpy as np
import statsmodels.api as sm
nsample = 100
x = np.linspace(0, 10, 100)
X = np.column_stack((x, x**2))
beta = np.array([1, 0.1, 10])
e = np.random.normal(size=nsample)
X = sm.add_constant(X)
y = np.dot(X, beta) + e
model = sm.OLS(y, X)
results = model.fit()
print(results.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 1.000
Model: OLS Adj. R-squared: 1.000
Method: Least Squares F-statistic: 4.020e+06
Date: Sun, 01 Feb 2015 Prob (F-statistic): 2.83e-239
Time: 09:32:32 Log-Likelihood: -146.51
No. Observations: 100 AIC: 299.0
Df Residuals: 97 BIC: 306.8
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
const 1.3423 0.313 4.292 0.000 0.722 1.963
x1 -0.0402 0.145 -0.278 0.781 -0.327 0.247
x2 10.0103 0.014 715.745 0.000 9.982 10.038
==============================================================================
Omnibus: 2.042 Durbin-Watson: 2.274
Prob(Omnibus): 0.360 Jarque-Bera (JB): 1.875
Skew: 0.234 Prob(JB): 0.392
Kurtosis: 2.519 Cond. No. 144.
==============================================================================