sklearn.linear_model.ridge 中的统计摘要 table？

Question

在 OLS 形式的 StatsModels 中，results.summary 显示回归结果的摘要（例如 AIC、BIC、R 平方...）

有什么方法可以在 sklearn.linear_model.ridge 中获得此摘要 table？

如果有人能指导我，我将不胜感激。谢谢。

Answer 1

据我所知，sklearn 中没有类似 R（或 Statsmodels）的摘要 table。（请勾选this answer）

相反，如果您需要它，statsmodels.regression.linear_model.OLS.fit_regularized class。（L1_wt=0 用于岭回归。）

目前看来，model.fit_regularized(~).summary() returns None 尽管有下面的文档字符串。但是对象有params，summary()不知何故可以使用。

Returns: A RegressionResults object, of the same type returned by fit.

示例。

示例数据不适用于岭回归，但无论如何我都会尝试。

在.

import numpy as np
import pandas as pd
import statsmodels
import statsmodels.api as sm
import matplotlib.pyplot as plt

statsmodels.__version__

出局。

'0.8.0rc1'

在.

data = sm.datasets.ccard.load()

print "endog: " + data.endog_name
print "exog: " + ', '.join(data.exog_name)

data.exog[:5, :]

出局。

endog: AVGEXP
exog: AGE, INCOME, INCOMESQ, OWNRENT
Out[2]:
array([[ 38.    ,   4.52  ,  20.4304,   1.    ],
       [ 33.    ,   2.42  ,   5.8564,   0.    ],
       [ 34.    ,   4.5   ,  20.25  ,   1.    ],
       [ 31.    ,   2.54  ,   6.4516,   0.    ],
       [ 32.    ,   9.79  ,  95.8441,   1.    ]])

在.

y, X = data.endog, data.exog

model = sm.OLS(y, X)
results_fu = model.fit()

print results_fu.summary()

出局。

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.543
Model:                            OLS   Adj. R-squared:                  0.516
Method:                 Least Squares   F-statistic:                     20.22
Date:                Wed, 19 Oct 2016   Prob (F-statistic):           5.24e-11
Time:                        17:22:48   Log-Likelihood:                -507.24
No. Observations:                  72   AIC:                             1022.
Df Residuals:                      68   BIC:                             1032.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -6.8112      4.551     -1.497      0.139     -15.892       2.270
x2           175.8245     63.743      2.758      0.007      48.628     303.021
x3            -9.7235      6.030     -1.613      0.111     -21.756       2.309
x4            54.7496     80.044      0.684      0.496    -104.977     214.476
==============================================================================
Omnibus:                       76.325   Durbin-Watson:                   1.692
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              649.447
Skew:                           3.194   Prob(JB):                    9.42e-142
Kurtosis:                      16.255   Cond. No.                         87.5
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

在.

frames = []
for n in np.arange(0, 0.25, 0.05).tolist():
    results_fr = model.fit_regularized(L1_wt=0, alpha=n, start_params=results_fu.params)

    results_fr_fit = sm.regression.linear_model.OLSResults(model, 
                                                           results_fr.params, 
                                                           model.normalized_cov_params)
    frames.append(np.append(results_fr.params, results_fr_fit.ssr))

    df = pd.DataFrame(frames, columns=data.exog_name + ['ssr*'])
df.index=np.arange(0, 0.25, 0.05).tolist()
df.index.name = 'alpha*'
df.T

出局。

在.

%matplotlib inline

fig, ax = plt.subplots(1, 2, figsize=(14, 4))

ax[0] = df.iloc[:, :-1].plot(ax=ax[0])
ax[0].set_title('Coefficient')

ax[1] = df.iloc[:, -1].plot(ax=ax[1])
ax[1].set_title('SSR')

出局。

在.

results_fr = model.fit_regularized(L1_wt=0, alpha=0.04, start_params=results_fu.params)
final = sm.regression.linear_model.OLSResults(model, 
                                              results_fr.params, 
                                              model.normalized_cov_params)

print final.summary()

出局。

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.543
Model:                            OLS   Adj. R-squared:                  0.516
Method:                 Least Squares   F-statistic:                     20.17
Date:                Wed, 19 Oct 2016   Prob (F-statistic):           5.46e-11
Time:                        17:22:49   Log-Likelihood:                -507.28
No. Observations:                  72   AIC:                             1023.
Df Residuals:                      68   BIC:                             1032.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -5.6375      4.554     -1.238      0.220     -14.724       3.449
x2           159.1412     63.781      2.495      0.015      31.867     286.415
x3            -8.1360      6.034     -1.348      0.182     -20.176       3.904
x4            44.2597     80.093      0.553      0.582    -115.564     204.083
==============================================================================
Omnibus:                       76.819   Durbin-Watson:                   1.694
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              658.948
Skew:                           3.220   Prob(JB):                    8.15e-144
Kurtosis:                      16.348   Cond. No.                         87.5
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Answer 2

from statsmodels.tools.tools import pinv_extended
X_train = sm.add_constant(X_train)
model = sm.OLS(y_train, X_train)

# If True, the model is refit using only the variables that have non-zero 
# coefficients in the regularized fit. The refitted model is not regularized.
result = model.fit_regularized(
                    method = 'elastic_net',
                    alpha = alp,
                    L1_wt = l1,
                    start_params = None,
                    profile_scale = False,
                    #refit = True,
                    refit = False,
                    maxiter = 9000,
                    zero_tol = 1e-5,
                    )

pinv_wexog,_ = pinv_extended(model.wexog)
normalized_cov_params = np.dot(pinv_wexog, np.transpose(pinv_wexog))


final = sm.regression.linear_model.OLSResults(model, 
                                  result.params, 
                                  normalized_cov_params)
#print(final.summary())
x = sm.add_constant(X_test)
#x = X_test
R2 = r2_score(y_test, final.predict(x))

p = final.pvalues
t = final.tvalues

Answer 3

感谢@Bugee，我创建了以下函数，returns statsmodels 指标，例如 rsquared 和 rsquared_adj，以及 summary()。

您也可以使用 sklearn 线性模型（LinearRegression、Lasso、Ridge...）和 statsmodels OLS 以及正则化 OLS。

from statsmodels.tools.tools import pinv_extended
import statsmodels.api as sm
import sklearn, statsmodels

def regression_analysis(X, y, model):
    
    is_statsmodels = False
    is_sklearn = False
    
    # check for accepted linear models
    if type(model) in [sklearn.linear_model._base.LinearRegression,
                       sklearn.linear_model._ridge.Ridge,
                       sklearn.linear_model._ridge.RidgeCV,
                       sklearn.linear_model._coordinate_descent.Lasso,
                       sklearn.linear_model._coordinate_descent.LassoCV,
                       sklearn.linear_model._coordinate_descent.ElasticNet,
                       sklearn.linear_model._coordinate_descent.ElasticNetCV,
                      ]:
        is_sklearn = True
    elif type(model) in [statsmodels.regression.linear_model.OLS, 
                         statsmodels.base.elastic_net.RegularizedResults,
                        ]:
        is_statsmodels = True
    else:
        print("Only linear models are supported!")
        return None
    
    
    
    has_intercept = False
    
    if is_statsmodels and all(np.array(X)[:,0]==1):
        # statsmodels add_constant has been used already
        has_intercept = True  
    elif is_sklearn and model.intercept_:
        has_intercept = True
        

    
    if is_statsmodels:
        # add_constant has been used already
        x = X
        model_params = model.params
    else: # sklearn model
        if has_intercept:
            x = sm.add_constant(X)
            model_params = np.hstack([np.array([model.intercept_]), model.coef_])
        else:
            x = X
            model_params = model.coef_
        
    #y = np.array(y).ravel()
    
    # define the OLS model
    olsModel = sm.OLS(y, x)
    
    pinv_wexog,_ = pinv_extended(x)
    normalized_cov_params = np.dot(pinv_wexog, np.transpose(pinv_wexog))
    
    
    return sm.regression.linear_model.OLSResults(olsModel, model_params, normalized_cov_params)

如何使用？

from sklearn.linear_model import Ridge


skridge = Ridge(alpha=0.2, max_iter=9000, tol=1e-5, fit_intercept=True)
skridge.fit(X,y)

result = regression_analysis(X, y, skridge)
result.summary()

sklearn.linear_model.ridge 中的统计摘要 table？

statistical summary table in sklearn.linear_model.ridge?

python

statsmodels

sklearn-pandas