如何更新逻辑回归模型?

How to update Logistic Regression Model?

我训练了一个逻辑回归模型。现在我必须用新的训练数据集更新(部分拟合)模型。可能吗?

不能LogisticRegression上使用partial_fit

但是你可以:

  • 使用warm_start=True,重用之前调用fit的解作为初始化,加速收敛。
  • SGDClassifierloss='log' 一起使用,相当于 LogisticRegression,并且支持 partial_fit

注意 partial_fitwarm_start 之间的区别。两种方法都从以前的模型开始并更新它,但是 partial_fit 只更新了一点点模型,而 warm_start 一直在新的训练数据上收敛,忘记了以前的模型。 warm_start只是用来加速收敛。

另见 the glossary:

warm_start

When fitting an estimator repeatedly on the same dataset, but for multiple parameter values (such as to find the value maximizing performance as in grid search), it may be possible to reuse aspects of the model learnt from the previous parameter value, saving time. When warm_start is true, the existing fitted model attributes an are used to initialise the new model in a subsequent call to fit.

Note that this is only applicable for some models and some parameters, and even some orders of parameter values. For example, warm_start may be used when building random forests to add more trees to the forest (increasing n_estimators) but not to reduce their number.

partial_fit also retains the model between calls, but differs: with warm_start the parameters change and the data is (more-or-less) constant across calls to fit; with partial_fit, the mini-batch of data changes and model parameters stay fixed.

There are cases where you want to use warm_start to fit on different, but closely related data. For example, one may initially fit to a subset of the data, then fine-tune the parameter search on the full dataset. For classification, all data in a sequence of warm_start calls to fit must include samples from each class.

__

partial_fit

Facilitates fitting an estimator in an online fashion. Unlike fit, repeatedly calling partial_fit does not clear the model, but updates it with respect to the data provided. The portion of data provided to partial_fit may be called a mini-batch. Each mini-batch must be of consistent shape, etc.

partial_fit may also be used for out-of-core learning, although usually limited to the case where learning can be performed online, i.e. the model is usable after each partial_fit and there is no separate processing needed to finalize the model. cluster.Birch introduces the convention that calling partial_fit(X) will produce a model that is not finalized, but the model can be finalized by calling partial_fit() i.e. without passing a further mini-batch.

Generally, estimator parameters should not be modified between calls to partial_fit, although partial_fit should validate them as well as the new mini-batch of data. In contrast, warm_start is used to repeatedly fit the same estimator with the same data but varying parameters.