从管道获取模型属性

Question

我通常会得到 PCA 这样的加载：

pca = PCA(n_components=2)
X_t = pca.fit(X).transform(X)
loadings = pca.components_

如果我运行 PCA 使用 scikit-learn 管道：

from sklearn.pipeline import Pipeline
pipeline = Pipeline(steps=[    
('scaling',StandardScaler()),
('pca',PCA(n_components=2))
])
X_t=pipeline.fit_transform(X)

是否可以获取负载？

只是尝试 loadings = pipeline.components_ 失败：

AttributeError: 'Pipeline' object has no attribute 'components_'

（也有兴趣从管道中提取 coef_ 等属性。）

Answer 1

你看过文档了吗：http://scikit-learn.org/dev/modules/pipeline.html 我觉得已经很清楚了。

更新：在 0.21 中你可以只使用方括号：

pipeline['pca']

或指数

pipeline[1]

有两种方法可以到达管道中的步骤，使用索引或使用您提供的字符串名称：

pipeline.named_steps['pca']
pipeline.steps[1][1]

这将为您提供 PCA 对象，您可以在该对象上获取组件。使用 named_steps，您还可以将属性访问与允许自动完成的 . 一起使用：

pipeline.names_steps.pca.<tab here gives autocomplete>

Answer 2

使用 Neuraxle

使用 Neuraxle 可以更简单地处理管道。例如，您可以这样做：

from neuraxle.pipeline import Pipeline

# Create and fit the pipeline: 
pipeline = Pipeline([
    StandardScaler(),
    PCA(n_components=2)
])
pipeline, X_t = pipeline.fit_transform(X)

# Get the components: 
pca = pipeline[-1]
components = pca.components_

您可以根据需要通过这三种不同的方式访问您的 PCA：

pipeline['PCA']
pipeline[-1]
pipeline[1]

Neuraxle 是一个建立在 scikit-learn 之上的流水线库，可以将流水线提升到一个新的水平。它允许轻松管理超参数分布空间、嵌套管道、保存和重新加载、REST API 服务等。整个事情也是为了使用深度学习算法并允许并行计算。

嵌套管道：

您可以在管道中包含管道，如下所示。

# Create and fit the pipeline: 
pipeline = Pipeline([
    StandardScaler(),
    Identity(),
    Pipeline([
        Identity(),  # Note: an Identity step is a step that does nothing. 
        Identity(),  # We use it here for demonstration purposes. 
        Identity(),
        Pipeline([
            Identity(),
            PCA(n_components=2)
        ])
    ])
])
pipeline, X_t = pipeline.fit_transform(X)

那么你需要这样做：

# Get the components: 
pca = pipeline["Pipeline"]["Pipeline"][-1]
components = pca.components_

从管道获取模型属性

Getting model attributes from pipeline

python

pipeline

scikit-learn

使用 Neuraxle

嵌套管道：