如何使用 Lasso 和 RobustScalar 构建预测函数?
How to build a predict function with Lasso and RobustScalar?
我想弄清楚如何在不使用 Sklearn 提供的 .predict
函数的情况下使用 LASSO 回归预测值。这基本上只是为了拓宽我对 LASSO 内部工作原理的理解。我在 Cross Validated 上问了一个关于 LASSO 回归如何工作的问题,其中一条评论提到了预测函数如何与线性回归中的工作方式相同。因此,我想尝试制作自己的功能来执行此操作。
我能够在更简单的示例中成功地重新创建预测函数,但是当我尝试将它与 RobustScaler
结合使用时,我不断得到不同的输出。在这个例子中,我用 Sklearn 得到的预测是 4.33,用我自己的函数得到的预测是 6.18。我在这里错过了什么?最后我没有正确地逆变换预测吗?
import pandas as pd
from sklearn.preprocessing import RobustScaler
from sklearn.linear_model import Lasso
import numpy as np
df = pd.DataFrame({'Y':[5, -10, 10, .5, 2.5, 15], 'X1':[1., -2., 2., .1, .5, 3], 'X2':[1, 1, 2, 1, 1, 1],
'X3':[6, 6, 6, 5, 6, 4], 'X4':[6, 5, 4, 3, 2, 1]})
X = df[['X1','X2','X3','X4']]
y = df[['Y']]
#Scaling
transformer_x = RobustScaler().fit(X)
transformer_y = RobustScaler().fit(y)
X_scal = transformer_x.transform(X)
y_scal = transformer_y.transform(y)
#LASSO
lasso = Lasso()
lasso = lasso.fit(X_scal, y_scal)
#LASSO info
print('Score: ', lasso.score(X_scal,y_scal))
print('Raw Intercept: ', lasso.intercept_.round(2)[0])
intercept = transformer_y.inverse_transform([lasso.intercept_])[0][0]
print('Unscaled Intercept: ', intercept)
print('\nCoefficients Used: ')
coeff_array = lasso.coef_
inverse_coeff_array = transformer_x.inverse_transform(lasso.coef_.reshape(1,-1))[0]
for i,j,k in zip(X.columns, coeff_array, inverse_coeff_array):
if j != 0:
print(i, j.round(2), k.round(2))
#Predictions
example = [[3,1,1,1]]
pred = lasso.predict(example)
pred_scal = transformer_y.inverse_transform(pred.reshape(-1, 1))
print('\nRaw Prediction where X1 = 3: ', pred[0])
print('Unscaled Prediction where X1 = 3: ', pred_scal[0][0])
#Predictions without using the .predict function
def lasso_predict_value_(X1,X2,X3,X4):
print('intercept: ', intercept)
print('coef: ', inverse_coeff_array[0])
print('X1: ', X1)
preds = intercept + inverse_coeff_array[0]*X1
print('Your predicted value is: ', preds)
lasso_predict_value_(3,1,1,1)
经过训练的 Lasso
不知道给定数据点是否被调用的任何信息。因此,您进行预测的手动方法不应采用它的缩放方面。
如果我去掉你对模型系数的处理,我们可以得到sklearn模型的结果
example = [[3,1,1,1]]
lasso.predict(example)
# array([0.07533937])
#Predictions without using the .predict function
def lasso_predict_value_(X1,X2,X3,X4):
x_test = np.array([X1,X2, X3, X4])
preds = lasso.intercept_ + sum(x_test*lasso.coef_)
print('Your predicted value is: ', preds)
lasso_predict_value_(3,1,1,1)
# Your predicted value is: [0.07533937]
更新 2:
Once I use LASSO, I then need to see what my predictions were in their
original units. My dependent variable is in dollar amounts, and if I
don't inverse transform it back, I'm unable to see how many dollars I
need for the prediction.
这是一个非常有效的场景。您需要应用 transformer_y.inverse_transform
来获取您未调用的美元金额。无需扰乱模型权重。
更新示例
example = [[3,1,1,1]]
scaled_pred = lasso.predict(transformer_x.transform(example))
transformer_y.inverse_transform([scaled_pred])
# array([[4.07460407]])
#Predictions without using the .predict function
def lasso_predict_value_(X1,X2,X3,X4):
x_test = transformer_x.transform(np.array([X1,X2, X3, X4]).reshape(1,-1))[0]
preds = lasso.intercept_ + sum(x_test*lasso.coef_)
print('Your predicted value is: ', preds)
print('Your unscaled predicted value is: ',
transformer_y.inverse_transform([scaled_pred]))
lasso_predict_value_(3,1,1,1)
# Your predicted value is: [0.0418844]
# Your unscaled predicted value is: [[4.07460407]]
我想弄清楚如何在不使用 Sklearn 提供的 .predict
函数的情况下使用 LASSO 回归预测值。这基本上只是为了拓宽我对 LASSO 内部工作原理的理解。我在 Cross Validated 上问了一个关于 LASSO 回归如何工作的问题,其中一条评论提到了预测函数如何与线性回归中的工作方式相同。因此,我想尝试制作自己的功能来执行此操作。
我能够在更简单的示例中成功地重新创建预测函数,但是当我尝试将它与 RobustScaler
结合使用时,我不断得到不同的输出。在这个例子中,我用 Sklearn 得到的预测是 4.33,用我自己的函数得到的预测是 6.18。我在这里错过了什么?最后我没有正确地逆变换预测吗?
import pandas as pd
from sklearn.preprocessing import RobustScaler
from sklearn.linear_model import Lasso
import numpy as np
df = pd.DataFrame({'Y':[5, -10, 10, .5, 2.5, 15], 'X1':[1., -2., 2., .1, .5, 3], 'X2':[1, 1, 2, 1, 1, 1],
'X3':[6, 6, 6, 5, 6, 4], 'X4':[6, 5, 4, 3, 2, 1]})
X = df[['X1','X2','X3','X4']]
y = df[['Y']]
#Scaling
transformer_x = RobustScaler().fit(X)
transformer_y = RobustScaler().fit(y)
X_scal = transformer_x.transform(X)
y_scal = transformer_y.transform(y)
#LASSO
lasso = Lasso()
lasso = lasso.fit(X_scal, y_scal)
#LASSO info
print('Score: ', lasso.score(X_scal,y_scal))
print('Raw Intercept: ', lasso.intercept_.round(2)[0])
intercept = transformer_y.inverse_transform([lasso.intercept_])[0][0]
print('Unscaled Intercept: ', intercept)
print('\nCoefficients Used: ')
coeff_array = lasso.coef_
inverse_coeff_array = transformer_x.inverse_transform(lasso.coef_.reshape(1,-1))[0]
for i,j,k in zip(X.columns, coeff_array, inverse_coeff_array):
if j != 0:
print(i, j.round(2), k.round(2))
#Predictions
example = [[3,1,1,1]]
pred = lasso.predict(example)
pred_scal = transformer_y.inverse_transform(pred.reshape(-1, 1))
print('\nRaw Prediction where X1 = 3: ', pred[0])
print('Unscaled Prediction where X1 = 3: ', pred_scal[0][0])
#Predictions without using the .predict function
def lasso_predict_value_(X1,X2,X3,X4):
print('intercept: ', intercept)
print('coef: ', inverse_coeff_array[0])
print('X1: ', X1)
preds = intercept + inverse_coeff_array[0]*X1
print('Your predicted value is: ', preds)
lasso_predict_value_(3,1,1,1)
经过训练的 Lasso
不知道给定数据点是否被调用的任何信息。因此,您进行预测的手动方法不应采用它的缩放方面。
如果我去掉你对模型系数的处理,我们可以得到sklearn模型的结果
example = [[3,1,1,1]]
lasso.predict(example)
# array([0.07533937])
#Predictions without using the .predict function
def lasso_predict_value_(X1,X2,X3,X4):
x_test = np.array([X1,X2, X3, X4])
preds = lasso.intercept_ + sum(x_test*lasso.coef_)
print('Your predicted value is: ', preds)
lasso_predict_value_(3,1,1,1)
# Your predicted value is: [0.07533937]
更新 2:
Once I use LASSO, I then need to see what my predictions were in their original units. My dependent variable is in dollar amounts, and if I don't inverse transform it back, I'm unable to see how many dollars I need for the prediction.
这是一个非常有效的场景。您需要应用 transformer_y.inverse_transform
来获取您未调用的美元金额。无需扰乱模型权重。
更新示例
example = [[3,1,1,1]]
scaled_pred = lasso.predict(transformer_x.transform(example))
transformer_y.inverse_transform([scaled_pred])
# array([[4.07460407]])
#Predictions without using the .predict function
def lasso_predict_value_(X1,X2,X3,X4):
x_test = transformer_x.transform(np.array([X1,X2, X3, X4]).reshape(1,-1))[0]
preds = lasso.intercept_ + sum(x_test*lasso.coef_)
print('Your predicted value is: ', preds)
print('Your unscaled predicted value is: ',
transformer_y.inverse_transform([scaled_pred]))
lasso_predict_value_(3,1,1,1)
# Your predicted value is: [0.0418844]
# Your unscaled predicted value is: [[4.07460407]]