在套索模型中创建一个 DF

Creating a DF inside of the lasso model

在这个项目中我是运行一个套索模型:

def build_and_fit_lasso_model(X, y):
    """Creates and returns a LASSO model that is fitted to the values of the
    given predictor and target X, and y.
    """
    model = LassoLarsCV(cv=10, precompute = False)  
    model = model.fit(X_train.values, y_train.values)
    return model

lasso_model = build_and_fit_lasso_model(X_train, y_train)
lasso_model

在 运行 之后,我想创建一个函数,该函数 returns 一个在拟合套索模型中具有变量名称和系数的数据框。 这是我的代码。

def get_coefficients(model, X):
    """Returns a DataFrame containing the columns `label` and `coeff` which are
    the coefficients by column name.
    """
    predictors_model = pd.DataFrame(filtered_data)#filtered_data is the name of the df used in the model
    predictors_model.columns = ['label']
    predictors_model['coeff'] =  model.coef_ 
    return predictors_model

当我运行这段代码时:

coefficients = get_coefficients(lasso_model, X)

我收到错误“ValueError:长度不匹配:预期轴有 19 个元素,新值有 1 个元素”

您收到该错误是因为 1. 在代码中指定了 X 但未使用,以及 2. 尺寸错误,您指定的 data.frame 与您的输入一样长数据。假设您的数据是这样的:

from sklearn.linear_model import LassoLarsCV
def build_and_fit_lasso_model(X, y):

    model = LassoLarsCV(cv=10, precompute = False)  
    model = model.fit(X_train.values, y_train.values)
    return model

df = pd.DataFrame(np.random.normal(0,1,(50,5)),columns=['x1','x2','x3','x4','x5'])
df['y']  = np.random.normal(0,1,50)

X_train = df[['x1','x2','x3','x4','x5']]
y_train = df['y']
lasso_model = build_and_fit_lasso_model(X_train, y_train)

一种快速的方法是将系数放入 data.frame 并将名称添加为另一列:

def get_coefficients(model,X):

    predictors_model = pd.DataFrame({'label':X.columns,'coeff':model.coef_})
    return predictors_model

get_coefficients(lasso_model,X_train)

    label   coeff
0   x1  0.0
1   x2  0.0
2   x3  0.0
3   x4  0.0
4   x5  0.0