可以 modify/prune 在 scikit-learn 中学习树?
Possible to modify/prune learned trees in scikit-learn?
可以通过
访问sklearn中的树参数
tree.tree_.children_left
tree.tree_.children_right
tree.tree_.threshold
tree.tree_.feature
等等
但是,尝试写入这些变量会引发不可写异常
有没有办法修改学习树,或者绕过AttributeError not writable?
属性均为不可覆盖的int数组。您仍然可以修改这些数组的元素。那不会减轻数据。
children_left : array of int, shape [node_count]
children_left[i] holds the node id of the left child of node i.
For leaves, children_left[i] == TREE_LEAF. Otherwise,
children_left[i] > i. This child handles the case where
X[:, feature[i]] <= threshold[i].
children_right : array of int, shape [node_count]
children_right[i] holds the node id of the right child of node i.
For leaves, children_right[i] == TREE_LEAF. Otherwise,
children_right[i] > i. This child handles the case where
X[:, feature[i]] > threshold[i].
feature : array of int, shape [node_count]
feature[i] holds the feature to split on, for the internal node i.
threshold : array of double, shape [node_count]
threshold[i] holds the threshold for the internal node i.
为了根据节点中的观察数量修剪决策树,我使用了这个函数。您需要知道 TREE_LEAF 常量等于 -1。
def prune(decisiontree, min_samples_leaf = 1):
if decisiontree.min_samples_leaf >= min_samples_leaf:
raise Exception('Tree already more pruned')
else:
decisiontree.min_samples_leaf = min_samples_leaf
tree = decisiontree.tree_
for i in range(tree.node_count):
n_samples = tree.n_node_samples[i]
if n_samples <= min_samples_leaf:
tree.children_left[i]=-1
tree.children_right[i]=-1
这是一个在前后生成 graphviz 输出的示例:
[from sklearn.tree import DecisionTreeRegressor as DTR
from sklearn.datasets import load_diabetes
from sklearn.tree import export_graphviz as export
bunch = load_diabetes()
data = bunch.data
target = bunch.target
dtr = DTR(max_depth = 4)
dtr.fit(data,target)
export(decision_tree=dtr.tree_, out_file='before.dot')
prune(dtr, min_samples_leaf = 100)
export(decision_tree=dtr.tree_, out_file='after.dot')][1]
可以通过
访问sklearn中的树参数tree.tree_.children_left
tree.tree_.children_right
tree.tree_.threshold
tree.tree_.feature
等等
但是,尝试写入这些变量会引发不可写异常
有没有办法修改学习树,或者绕过AttributeError not writable?
属性均为不可覆盖的int数组。您仍然可以修改这些数组的元素。那不会减轻数据。
children_left : array of int, shape [node_count]
children_left[i] holds the node id of the left child of node i.
For leaves, children_left[i] == TREE_LEAF. Otherwise,
children_left[i] > i. This child handles the case where
X[:, feature[i]] <= threshold[i].
children_right : array of int, shape [node_count]
children_right[i] holds the node id of the right child of node i.
For leaves, children_right[i] == TREE_LEAF. Otherwise,
children_right[i] > i. This child handles the case where
X[:, feature[i]] > threshold[i].
feature : array of int, shape [node_count]
feature[i] holds the feature to split on, for the internal node i.
threshold : array of double, shape [node_count]
threshold[i] holds the threshold for the internal node i.
为了根据节点中的观察数量修剪决策树,我使用了这个函数。您需要知道 TREE_LEAF 常量等于 -1。
def prune(decisiontree, min_samples_leaf = 1):
if decisiontree.min_samples_leaf >= min_samples_leaf:
raise Exception('Tree already more pruned')
else:
decisiontree.min_samples_leaf = min_samples_leaf
tree = decisiontree.tree_
for i in range(tree.node_count):
n_samples = tree.n_node_samples[i]
if n_samples <= min_samples_leaf:
tree.children_left[i]=-1
tree.children_right[i]=-1
这是一个在前后生成 graphviz 输出的示例:
[from sklearn.tree import DecisionTreeRegressor as DTR
from sklearn.datasets import load_diabetes
from sklearn.tree import export_graphviz as export
bunch = load_diabetes()
data = bunch.data
target = bunch.target
dtr = DTR(max_depth = 4)
dtr.fit(data,target)
export(decision_tree=dtr.tree_, out_file='before.dot')
prune(dtr, min_samples_leaf = 100)
export(decision_tree=dtr.tree_, out_file='after.dot')][1]