从流水线中获取相关特征并构建决策树
Get the relevant features from the pipeline and build a DecisionTree
我有一个训练决策树的管道。我想输出成功培训后使用的功能,然后我想显示我的决策树。但是,出现以下错误:AttributeError: 'GridSearchCV' object has no attribute 'n_features_'
- 如何显示训练期间使用的相关特征?
- 如何创建决策树?
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=30, stratify=y)
feature_selection=SelectFromModel(LogisticRegression(max_iter=1000000))
#scaler =
classifier=DecisionTreeClassifier()
steps = [('scaler', MinMaxScaler()), ('feature_selection', feature_selection), ('dec_tree', DecisionTreeClassifier())]
pipeline = Pipeline(steps)
# parameters
parameteres = {'dec_tree__max_depth':[list(range(1,X.shape[1]+1,1))],
'dec_tree__criterion':['gini', 'entropy'],
'dec_tree__max_depth':[2,4,6,8,10,12]}
grid = GridSearchCV(pipeline, param_grid=parameteres, cv=5)
grid.fit(X_train, y_train)
print("score = %3.2f" %(grid.score(X_test,y_test)))
print('Training set score: ' + str(grid.score(X_train,y_train)))
print('Test set score: ' + str(grid.score(X_test,y_test)))
print(grid.best_params_)
y_pred = grid.predict(X_test)
如您所见,我需要列名或在 feature_selection
培训期间采用的列
# I need the feature_selection the features
# get decision tree
dot_data = StringIO()
# the error occurs here
export_graphviz(grid, out_file=dot_data,
filled=True, rounded=True,
special_characters=True,feature_names = <GET_COLUMNS>,class_names=['0','1']) # I need the column names from feature_selection here
您 运行 gridsearchcv 在管道上,因此要应用您的可视化,您需要从 best_estimator_
中提取分类器,例如:
export_graphviz(grid.best_estimator_.named_steps['dec_tree'])
一个例子:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier,export_graphviz
from sklearn.pipeline import Pipeline
X,y = make_classification()
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=30, stratify=y)
feature_selection=SelectFromModel(LogisticRegression(max_iter=1000000))
classifier=DecisionTreeClassifier()
steps = [('scaler', MinMaxScaler()), ('feature_selection', feature_selection), ('dec_tree', DecisionTreeClassifier())]
pipeline = Pipeline(steps)
parameteres = {'dec_tree__max_depth':[list(range(1,X.shape[1]+1,1))],
'dec_tree__criterion':['gini', 'entropy'],
'dec_tree__max_depth':[2,4,6,8,10,12]}
grid = GridSearchCV(pipeline, param_grid=parameteres, cv=5)
grid.fit(X_train, y_train)
print(grid.best_params_)
{'dec_tree__criterion': 'gini', 'dec_tree__max_depth': 2}
现在我们可以检查最佳估计器对应于最佳参数:
grid.best_estimator_.named_steps['dec_tree'].get_params()
{'ccp_alpha': 0.0,
'class_weight': None,
'criterion': 'gini',
'max_depth': 2,
'max_features': None,
'max_leaf_nodes': None,
'min_impurity_decrease': 0.0,
'min_samples_leaf': 1,
'min_samples_split': 2,
'min_weight_fraction_leaf': 0.0,
'random_state': None,
'splitter': 'best'}
运行 你的函数:
export_graphviz(grid.best_estimator_.named_steps['dec_tree'])
以类似的方式获取所选特征:
grid.best_estimator_.named_steps['feature_selection'].get_feature_names_out()
我有一个训练决策树的管道。我想输出成功培训后使用的功能,然后我想显示我的决策树。但是,出现以下错误:AttributeError: 'GridSearchCV' object has no attribute 'n_features_'
- 如何显示训练期间使用的相关特征?
- 如何创建决策树?
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=30, stratify=y)
feature_selection=SelectFromModel(LogisticRegression(max_iter=1000000))
#scaler =
classifier=DecisionTreeClassifier()
steps = [('scaler', MinMaxScaler()), ('feature_selection', feature_selection), ('dec_tree', DecisionTreeClassifier())]
pipeline = Pipeline(steps)
# parameters
parameteres = {'dec_tree__max_depth':[list(range(1,X.shape[1]+1,1))],
'dec_tree__criterion':['gini', 'entropy'],
'dec_tree__max_depth':[2,4,6,8,10,12]}
grid = GridSearchCV(pipeline, param_grid=parameteres, cv=5)
grid.fit(X_train, y_train)
print("score = %3.2f" %(grid.score(X_test,y_test)))
print('Training set score: ' + str(grid.score(X_train,y_train)))
print('Test set score: ' + str(grid.score(X_test,y_test)))
print(grid.best_params_)
y_pred = grid.predict(X_test)
如您所见,我需要列名或在 feature_selection
# I need the feature_selection the features
# get decision tree
dot_data = StringIO()
# the error occurs here
export_graphviz(grid, out_file=dot_data,
filled=True, rounded=True,
special_characters=True,feature_names = <GET_COLUMNS>,class_names=['0','1']) # I need the column names from feature_selection here
您 运行 gridsearchcv 在管道上,因此要应用您的可视化,您需要从 best_estimator_
中提取分类器,例如:
export_graphviz(grid.best_estimator_.named_steps['dec_tree'])
一个例子:
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import MinMaxScaler
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier,export_graphviz
from sklearn.pipeline import Pipeline
X,y = make_classification()
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=30, stratify=y)
feature_selection=SelectFromModel(LogisticRegression(max_iter=1000000))
classifier=DecisionTreeClassifier()
steps = [('scaler', MinMaxScaler()), ('feature_selection', feature_selection), ('dec_tree', DecisionTreeClassifier())]
pipeline = Pipeline(steps)
parameteres = {'dec_tree__max_depth':[list(range(1,X.shape[1]+1,1))],
'dec_tree__criterion':['gini', 'entropy'],
'dec_tree__max_depth':[2,4,6,8,10,12]}
grid = GridSearchCV(pipeline, param_grid=parameteres, cv=5)
grid.fit(X_train, y_train)
print(grid.best_params_)
{'dec_tree__criterion': 'gini', 'dec_tree__max_depth': 2}
现在我们可以检查最佳估计器对应于最佳参数:
grid.best_estimator_.named_steps['dec_tree'].get_params()
{'ccp_alpha': 0.0,
'class_weight': None,
'criterion': 'gini',
'max_depth': 2,
'max_features': None,
'max_leaf_nodes': None,
'min_impurity_decrease': 0.0,
'min_samples_leaf': 1,
'min_samples_split': 2,
'min_weight_fraction_leaf': 0.0,
'random_state': None,
'splitter': 'best'}
运行 你的函数:
export_graphviz(grid.best_estimator_.named_steps['dec_tree'])
以类似的方式获取所选特征:
grid.best_estimator_.named_steps['feature_selection'].get_feature_names_out()