当转换器包含嵌入式管道时如何从 ELI5 获取功能名称
How to get feature names from ELI5 when transformer includes an embedded pipeline
ELI5 library provides the function transform_feature_names
to retrieve the feature names for the output of an sklearn transformer. The documentation 表示当转换器包含嵌套管道时,该函数开箱即用。
我正在尝试让函数在 中示例的简化版本上运行。我的简化示例不需要 Pipeline
,但在现实生活中我需要它来向 categorical_transformer
添加步骤,并且我还想向 ColumnTransformer
添加转换器。
import eli5
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
X_train = pd.DataFrame({'age': [23, 12, 12, 18],
'gender': ['M', 'F', 'F', 'F'],
'income': ['high', 'low', 'low', 'medium'],
'y': [0, 1, 1, 1]})
categorical_features = ['gender', 'income']
categorical_transformer = Pipeline(
steps=[('onehot', OneHotEncoder(handle_unknown='ignore'))])
transformers=[('categorical', categorical_transformer, categorical_features)]
preprocessor = ColumnTransformer(transformers)
X_train_transformed = preprocessor.fit(X_train)
eli5.transform_feature_names(preprocessor, list(X_train.columns))
这会随着消息消失
AttributeError: Transformer categorical (type Pipeline) does not provide get_feature_names.
由于 Pipeline
嵌套在 ColumnTransformer
中,我从 ELI5 文档中了解到它会被处理。
我是否需要使用 get_feature_names
方法创建 Pipeline
的修改版本或进行其他自定义修改才能利用 ELI5 功能?
我正在使用 python 3.7.6、eli5 0.10.1、pandas 0.25.3 和 sklearn 0.22.1。
我认为问题在于 eli5 依赖于 ColumnTransformer
方法 get_feature_names
,它本身要求 Pipeline
到 get_feature_names
,尚未实现sklearn.
I've opened an Issue with eli5 with your example.
一个可能的修复:为 ColumnTransformer
添加 transform_feature_names
分派;这可能只是对其现有 get_feature_names
的修改,以为其每个组件转换器调用 eli5 transform_feature_names
(而不是 sklearn 自己的 get_feature_names
)。以下似乎有效,尽管我不确定当 input_names
与训练数据框列不同时如何处理,在 ColumnTransformer
中可用 _df_columns
.
from eli5 import transform_feature_names
@transform_feature_names.register(ColumnTransformer)
def col_tfm_names(transformer, in_names=None):
if in_names is None:
from eli5.sklearn.utils import get_feature_names
# generate default feature names
in_names = get_feature_names(transformer, num_features=transformer._n_features)
# return a list of strings derived from in_names
feature_names = []
for name, trans, column, _ in transformer._iter(fitted=True):
if hasattr(transformer, '_df_columns'):
if ((not isinstance(column, slice))
and all(isinstance(col, str) for col in column)):
names = column
else:
names = transformer._df_columns[column]
else:
indices = np.arange(transformer._n_features)
names = ['x%d' % i for i in indices[column]]
# erm, want to be able to override with in_names maybe???
if trans == 'drop' or (
hasattr(column, '__len__') and not len(column)):
continue
if trans == 'passthrough':
feature_names.extend(names)
continue
feature_names.extend([name + "__" + f for f in
transform_feature_names(trans, in_names=names)])
return feature_names
我还需要为 OneHotEncoder
创建调度,因为它的 get_feature_names
需要参数 input_features
:
@transform_feature_names.register(OneHotEncoder)
def _ohe_names(est, in_names=None):
return est.get_feature_names(input_features=in_names)
相关链接:
https://eli5.readthedocs.io/en/latest/autodocs/eli5.html#eli5.transform_feature_names
https://github.com/TeamHG-Memex/eli5/blob/4839d1927c4a68aeff051935d1d4d8a4fb69b46d/eli5/sklearn/transform.py
ELI5 library provides the function transform_feature_names
to retrieve the feature names for the output of an sklearn transformer. The documentation 表示当转换器包含嵌套管道时,该函数开箱即用。
我正在尝试让函数在 Pipeline
,但在现实生活中我需要它来向 categorical_transformer
添加步骤,并且我还想向 ColumnTransformer
添加转换器。
import eli5
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
X_train = pd.DataFrame({'age': [23, 12, 12, 18],
'gender': ['M', 'F', 'F', 'F'],
'income': ['high', 'low', 'low', 'medium'],
'y': [0, 1, 1, 1]})
categorical_features = ['gender', 'income']
categorical_transformer = Pipeline(
steps=[('onehot', OneHotEncoder(handle_unknown='ignore'))])
transformers=[('categorical', categorical_transformer, categorical_features)]
preprocessor = ColumnTransformer(transformers)
X_train_transformed = preprocessor.fit(X_train)
eli5.transform_feature_names(preprocessor, list(X_train.columns))
这会随着消息消失
AttributeError: Transformer categorical (type Pipeline) does not provide get_feature_names.
由于 Pipeline
嵌套在 ColumnTransformer
中,我从 ELI5 文档中了解到它会被处理。
我是否需要使用 get_feature_names
方法创建 Pipeline
的修改版本或进行其他自定义修改才能利用 ELI5 功能?
我正在使用 python 3.7.6、eli5 0.10.1、pandas 0.25.3 和 sklearn 0.22.1。
我认为问题在于 eli5 依赖于 ColumnTransformer
方法 get_feature_names
,它本身要求 Pipeline
到 get_feature_names
,尚未实现sklearn.
I've opened an Issue with eli5 with your example.
一个可能的修复:为 ColumnTransformer
添加 transform_feature_names
分派;这可能只是对其现有 get_feature_names
的修改,以为其每个组件转换器调用 eli5 transform_feature_names
(而不是 sklearn 自己的 get_feature_names
)。以下似乎有效,尽管我不确定当 input_names
与训练数据框列不同时如何处理,在 ColumnTransformer
中可用 _df_columns
.
from eli5 import transform_feature_names
@transform_feature_names.register(ColumnTransformer)
def col_tfm_names(transformer, in_names=None):
if in_names is None:
from eli5.sklearn.utils import get_feature_names
# generate default feature names
in_names = get_feature_names(transformer, num_features=transformer._n_features)
# return a list of strings derived from in_names
feature_names = []
for name, trans, column, _ in transformer._iter(fitted=True):
if hasattr(transformer, '_df_columns'):
if ((not isinstance(column, slice))
and all(isinstance(col, str) for col in column)):
names = column
else:
names = transformer._df_columns[column]
else:
indices = np.arange(transformer._n_features)
names = ['x%d' % i for i in indices[column]]
# erm, want to be able to override with in_names maybe???
if trans == 'drop' or (
hasattr(column, '__len__') and not len(column)):
continue
if trans == 'passthrough':
feature_names.extend(names)
continue
feature_names.extend([name + "__" + f for f in
transform_feature_names(trans, in_names=names)])
return feature_names
我还需要为 OneHotEncoder
创建调度,因为它的 get_feature_names
需要参数 input_features
:
@transform_feature_names.register(OneHotEncoder)
def _ohe_names(est, in_names=None):
return est.get_feature_names(input_features=in_names)
相关链接:
https://eli5.readthedocs.io/en/latest/autodocs/eli5.html#eli5.transform_feature_names
https://github.com/TeamHG-Memex/eli5/blob/4839d1927c4a68aeff051935d1d4d8a4fb69b46d/eli5/sklearn/transform.py