如何在不拟合的情况下实例化具有已知系数的 Scikit-Learn 线性模型
How to instantiate a Scikit-Learn linear model with known coefficients without fitting it
背景
作为实验的一部分,我正在测试各种保存的模型,但其中一个模型来自我编写的算法,而不是来自 sklearn 模型拟合。
但是,我的自定义模型仍然是线性模型,所以我想实例化一个 LinearModel
实例并将 coef_
和 intercept_
属性设置为自定义拟合算法的值所以我可以用它来预测。
到目前为止我尝试了什么:
from sklearn.linear_model import LinearRegression
my_intercepts = np.ones(2)
my_coefficients = np.random.randn(2, 3)
new_model = LinearRegression()
new_model.intercept_ = my_intercepts
new_model.coef_ = my_coefficients
预测似乎没问题:
X_test = np.random.randn(5, 3)
new_model.predict(X_test)
它通过了这个测试:
from sklearn.utils.validation import check_is_fitted
check_is_fitted(new_model)
问题
这个方法可以吗?这感觉就像一个 hack,我怀疑有一种 'proper' 方法可以做到这一点。
虽然问题中的简单技术有效,但危险在于您稍后可能会调用对象的拟合方法并覆盖您的系数。
如果模型仅用于预测,则稍微多 'proper' 的方法是继承 sklearn 的 class 并重载 fit 方法,如下所示:
class LinearPredictionModel(LinearRegression):
"""
This model is for prediction only. It has no fit method.
You can initialize it with fixed values for coefficients
and intercepts.
Parameters
----------
coef, intercept : arrays
See attribute descriptions below.
Attributes
----------
coef_ : array of shape (n_features, ) or (n_targets, n_features)
Coefficients of the linear model. If there are multiple targets
(y 2D), this is a 2D array of shape (n_targets, n_features),
whereas if there is only one target, this is a 1D array of
length n_features.
intercept_ : float or array of shape of (n_targets,)
Independent term in the linear model.
"""
def __init__(self, coef=None, intercept=None):
if coef is not None:
coef = np.array(coef)
if intercept is None:
intercept = np.zeros(coef.shape[0])
else:
intercept = np.array(intercept)
assert coef.shape[0] == intercept.shape[0]
else:
if intercept is not None:
raise ValueError("Provide coef only or both coef and intercept")
self.intercept_ = intercept
self.coef_ = coef
def fit(self, X, y):
"""This model does not have a fit method."""
raise NotImplementedError("model is only for prediction")
然后,实例化模型如下:
new_model = LinearPredictionModel(coef=my_coefficients, intercept=my_intercepts)
我认为唯一的 'proper' 方法是在 fit 方法中使用我的自定义算法完全实现一个新的 class 。但是对于在scikit-learn环境下测试系数的简单需求,这个方法似乎还可以。
这种方法非常适用于原始方法(例如线性回归),但是如何针对更复杂的模型(例如套索或弹性网或...)进行调整。看起来线性回归器可以像这样修改,但是套索回归器仍然会抛出错误(抱怨不合适:As in this question, which is indicated as a duplicate of the above.
背景
作为实验的一部分,我正在测试各种保存的模型,但其中一个模型来自我编写的算法,而不是来自 sklearn 模型拟合。
但是,我的自定义模型仍然是线性模型,所以我想实例化一个 LinearModel
实例并将 coef_
和 intercept_
属性设置为自定义拟合算法的值所以我可以用它来预测。
到目前为止我尝试了什么:
from sklearn.linear_model import LinearRegression
my_intercepts = np.ones(2)
my_coefficients = np.random.randn(2, 3)
new_model = LinearRegression()
new_model.intercept_ = my_intercepts
new_model.coef_ = my_coefficients
预测似乎没问题:
X_test = np.random.randn(5, 3)
new_model.predict(X_test)
它通过了这个测试:
from sklearn.utils.validation import check_is_fitted
check_is_fitted(new_model)
问题
这个方法可以吗?这感觉就像一个 hack,我怀疑有一种 'proper' 方法可以做到这一点。
虽然问题中的简单技术有效,但危险在于您稍后可能会调用对象的拟合方法并覆盖您的系数。
如果模型仅用于预测,则稍微多 'proper' 的方法是继承 sklearn 的 class 并重载 fit 方法,如下所示:
class LinearPredictionModel(LinearRegression):
"""
This model is for prediction only. It has no fit method.
You can initialize it with fixed values for coefficients
and intercepts.
Parameters
----------
coef, intercept : arrays
See attribute descriptions below.
Attributes
----------
coef_ : array of shape (n_features, ) or (n_targets, n_features)
Coefficients of the linear model. If there are multiple targets
(y 2D), this is a 2D array of shape (n_targets, n_features),
whereas if there is only one target, this is a 1D array of
length n_features.
intercept_ : float or array of shape of (n_targets,)
Independent term in the linear model.
"""
def __init__(self, coef=None, intercept=None):
if coef is not None:
coef = np.array(coef)
if intercept is None:
intercept = np.zeros(coef.shape[0])
else:
intercept = np.array(intercept)
assert coef.shape[0] == intercept.shape[0]
else:
if intercept is not None:
raise ValueError("Provide coef only or both coef and intercept")
self.intercept_ = intercept
self.coef_ = coef
def fit(self, X, y):
"""This model does not have a fit method."""
raise NotImplementedError("model is only for prediction")
然后,实例化模型如下:
new_model = LinearPredictionModel(coef=my_coefficients, intercept=my_intercepts)
我认为唯一的 'proper' 方法是在 fit 方法中使用我的自定义算法完全实现一个新的 class 。但是对于在scikit-learn环境下测试系数的简单需求,这个方法似乎还可以。
这种方法非常适用于原始方法(例如线性回归),但是如何针对更复杂的模型(例如套索或弹性网或...)进行调整。看起来线性回归器可以像这样修改,但是套索回归器仍然会抛出错误(抱怨不合适:As in this question, which is indicated as a duplicate of the above.