提取规则以预测决策树中的子节点或概率分数
Extracting rules to predict child nodes or probability scores in a Decision Tree
我对 Python 决策树的实施比较陌生。我正在尝试提取规则以仅预测子节点,我需要它能够预测新数据的概率分数(而不仅仅是最终分类)并可能将算法转移给其他用户。有简单的方法吗?我在 (How to extract the decision rules from scikit-learn decision-tree?) 找到了一些解决方案。但是,当我测试它们时,由于某种原因(我的树又大又深)我没有获得所有的子节点。任何建议表示赞赏。谢谢你。
我已经更新了上面 link 中的第一个代码来生成节点,它似乎最适合大树。但是,我很难让它与 pd Dataframes 一起使用。这是例子:
将 pandas 导入为 pd
将 numpy 导入为 np
从 sklearn.tree 导入 DecisionTreeClassifier
虚拟数据:
df = pd.DataFrame({'col1':[0,1,2,3],'col2':[3,4,5,6],'dv':[0,1,0,1]})
df
# create decision tree
dt = DecisionTreeClassifier(random_state=0, max_depth=5, min_samples_leaf=1)
dt.fit(df.loc[:,('col1','col2')], df.dv)
from sklearn.tree import _tree
def tree_to_code(tree, feature_names):
tree_ = tree.tree_
feature_name = [
feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
for i in tree_.feature
]
print ("def tree({}):".format(", ".join(feature_names)))
def recurse(node, depth):
indent = " " * depth
if tree_.feature[node] != _tree.TREE_UNDEFINED:
name = feature_name[node]
threshold = tree_.threshold[node]
print ("{}if {} <= {}:".format(indent, name, threshold))
recurse(tree_.children_left[node], depth + 1)
print ("{}else: # if {} > {}".format(indent, name, threshold))
recurse(tree_.children_right[node], depth + 1)
else:
print ("{}return {}".format(indent, node))
recurse(0, 1)
tree_to_code(dt, df.columns)
以上调用产生以下代码:
def tree(col1, col2, dv):
if col2 <= 3.5:
return 1
else: # if col2 > 3.5
if col1 <= 1.5:
return 3
else: # if col1 > 1.5
if col1 <= 2.5:
return 5
else: # if col1 > 2.5
return 6
而且,当我按如下方式调用上面的代码时,出现错误,提示我缺少一个参数。我如何修改代码以使其在 pandas DataFrame 上工作?
tree('col1', 'col2', 'dv_pred')
这是一个可行的解决方案
import pandas as pd
from sklearn.tree import _tree
from sklearn.tree import DecisionTreeClassifier
df = pd.DataFrame({'col1':[0,1,2,3],'col2':[3,4,5,6],'dv':[0,1,0,1]})
# create decision tree
dt = DecisionTreeClassifier(random_state=0, max_depth=5, min_samples_leaf=1)
features = ['col1','col2']
dt.fit(df.loc[:,features], df.dv)
def tree_to_code(tree, feature_names):
tree_ = tree.tree_
feature_name = [
feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
for i in tree_.feature
]
print ("def tree(x):")
def recurse(node, depth):
indent = " " * depth
if tree_.feature[node] != _tree.TREE_UNDEFINED:
name = feature_name[node]
threshold = tree_.threshold[node]
print ("{}if x['{}'] <= {}:".format(indent, name, threshold))
recurse(tree_.children_left[node], depth + 1)
print ("{}else: # if x['{}'] > {}".format(indent, name, threshold))
recurse(tree_.children_right[node], depth + 1)
else:
print ("{}return {}".format(indent, node))
recurse(0, 1)
tree_to_code(dt, df[features].columns)
然后得到预测
df.apply(tree, axis=1)
我对 Python 决策树的实施比较陌生。我正在尝试提取规则以仅预测子节点,我需要它能够预测新数据的概率分数(而不仅仅是最终分类)并可能将算法转移给其他用户。有简单的方法吗?我在 (How to extract the decision rules from scikit-learn decision-tree?) 找到了一些解决方案。但是,当我测试它们时,由于某种原因(我的树又大又深)我没有获得所有的子节点。任何建议表示赞赏。谢谢你。
我已经更新了上面 link 中的第一个代码来生成节点,它似乎最适合大树。但是,我很难让它与 pd Dataframes 一起使用。这是例子: 将 pandas 导入为 pd 将 numpy 导入为 np 从 sklearn.tree 导入 DecisionTreeClassifier
虚拟数据:
df = pd.DataFrame({'col1':[0,1,2,3],'col2':[3,4,5,6],'dv':[0,1,0,1]})
df
# create decision tree
dt = DecisionTreeClassifier(random_state=0, max_depth=5, min_samples_leaf=1)
dt.fit(df.loc[:,('col1','col2')], df.dv)
from sklearn.tree import _tree
def tree_to_code(tree, feature_names):
tree_ = tree.tree_
feature_name = [
feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
for i in tree_.feature
]
print ("def tree({}):".format(", ".join(feature_names)))
def recurse(node, depth):
indent = " " * depth
if tree_.feature[node] != _tree.TREE_UNDEFINED:
name = feature_name[node]
threshold = tree_.threshold[node]
print ("{}if {} <= {}:".format(indent, name, threshold))
recurse(tree_.children_left[node], depth + 1)
print ("{}else: # if {} > {}".format(indent, name, threshold))
recurse(tree_.children_right[node], depth + 1)
else:
print ("{}return {}".format(indent, node))
recurse(0, 1)
tree_to_code(dt, df.columns)
以上调用产生以下代码:
def tree(col1, col2, dv):
if col2 <= 3.5:
return 1
else: # if col2 > 3.5
if col1 <= 1.5:
return 3
else: # if col1 > 1.5
if col1 <= 2.5:
return 5
else: # if col1 > 2.5
return 6
而且,当我按如下方式调用上面的代码时,出现错误,提示我缺少一个参数。我如何修改代码以使其在 pandas DataFrame 上工作?
tree('col1', 'col2', 'dv_pred')
这是一个可行的解决方案
import pandas as pd
from sklearn.tree import _tree
from sklearn.tree import DecisionTreeClassifier
df = pd.DataFrame({'col1':[0,1,2,3],'col2':[3,4,5,6],'dv':[0,1,0,1]})
# create decision tree
dt = DecisionTreeClassifier(random_state=0, max_depth=5, min_samples_leaf=1)
features = ['col1','col2']
dt.fit(df.loc[:,features], df.dv)
def tree_to_code(tree, feature_names):
tree_ = tree.tree_
feature_name = [
feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
for i in tree_.feature
]
print ("def tree(x):")
def recurse(node, depth):
indent = " " * depth
if tree_.feature[node] != _tree.TREE_UNDEFINED:
name = feature_name[node]
threshold = tree_.threshold[node]
print ("{}if x['{}'] <= {}:".format(indent, name, threshold))
recurse(tree_.children_left[node], depth + 1)
print ("{}else: # if x['{}'] > {}".format(indent, name, threshold))
recurse(tree_.children_right[node], depth + 1)
else:
print ("{}return {}".format(indent, node))
recurse(0, 1)
tree_to_code(dt, df[features].columns)
然后得到预测
df.apply(tree, axis=1)