根据值遍历决策树；迭代进入子词典？

Question

我有一个代表决策树的字典：

{'Outlook': {'Overcast': 'Yes', 'Rain': {'Wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny': {'Temperature': {'Cool': 'Yes', 'Hot': 'No', 'Mild': 'No'}}}}

可视化，如下所示：

这棵树是用一些训练数据和 ID3 算法制作的；我希望根据我的测试数据预测示例的决定：

Outlook   Temperature Humidity Wind    Decision
Sunny     Mild        Normal   Strong  Yes
Overcast  Mild        High     Strong  Yes
Overcast  Hot         Normal   Weak    Yes
Rain      Mild        High     Strong  No

使用第一个示例，大致了解检查顺序：

Current dict 'outlook'
Examine 'outlook', found 'sunny':
  'sunny' is a dict, make current dict the 'sunny' subdict
  Examine 'temperature', found 'mild':
     'mild' is not a dict, return value 'no'

不过，我不确定如何像这样遍历字典。我有一些代码可以开始：

def fun(d, t):
    """
    d -- decision tree dictionary
    t -- testing examples in form of pandas dataframe
    """
    for _, e in t.iterrows():
        predict(d, e)

def predict(d, e):
    """
    d -- decision tree dictionary
    e -- a testing example in form of pandas series
    """
    # ?

在predict()中，e可以作为字典访问：

print(e.to_dict())
# {'Outlook': 'Rain', 'Temperature': 'Cool', 'Humidity': 'Normal', 'Wind': 'Weak', 'Decision': 'Yes'}
print(e['Outlook'])
# 'Rain'
print(e['Decision'])
# 'Yes'
# etc

我只是不确定如何遍历字典。我需要按照属性在决策树中出现的顺序迭代测试示例，而不是按照它们在测试示例中出现的顺序。

Answer 1

您需要实施递归解决方案来搜索，直到到达具有字符串值的节点（这将是您的叶子节点，决定为 "Yes" 或 "No"）。

import pandas as pd

dt = {'Outlook': {'Overcast': 'Yes', 'Rain': {'Wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny': {'Temperature': {'Cool': 'Yes', 'Hot': 'No', 'Mild': 'No'}}}}

df = pd.DataFrame(data=[['Sunny', 'Mild', 'Normal', 'Strong', 'Yes']],columns=['Outlook', 'Temperature', 'Humidity', 'Wind', 'Decision'])

def fun(d, t):
    """
    d -- decision tree dictionary
    t -- testing examples in form of pandas dataframe
    """
    res = []
    for _, e in t.iterrows():
        res.append(predict(d, e))
    return res

def predict(d, e):
    """
    d -- decision tree dictionary
    e -- a testing example in form of pandas series
    """
    current_node = list(d.keys())[0]
    current_branch = d[current_node][e[current_node]]
    # if leaf node value is string then its a decision
    if isinstance(current_branch, str):
        return current_branch
    # else use that node as new searching subtree
    else:
        return predict(current_branch, e)

print(fun(dt, df))

输出：

['No']

Answer 2

您也可以迭代地实现它，只需要跟踪当前的字典：

def predict(d, e):
    """
    d -- decision tree dictionary
    e -- a testing example in form of pandas series
    """
    c = d
    for k, v in e.iteritems():
        print(f"Current dict '{k}'")
        try:
            c = c[k][v]
        except KeyError:
            # Do something sensible here
            continue
        print(f"Examine '{k}', found '{v}': ")
        if isinstance(c, dict):
            print(f"'{v}' is a dict, make current dict the '{v}' subdict")
        else:
            print(f"'{v}' is not a dict, return {c}\n")
            return c

fun(data, test)

结果：

Current dict 'Outlook'
Examine 'Outlook', found 'Sunny': 
'Sunny' is a dict, make current dict the 'Sunny' subdict
Current dict 'Temperature'
Examine 'Temperature', found 'Mild': 
'Mild' is not a dict, return No

Current dict 'Outlook'
Examine 'Outlook', found 'Overcast': 
'Overcast' is not a dict, return Yes

Current dict 'Outlook'
Examine 'Outlook', found 'Overcast': 
'Overcast' is not a dict, return Yes

Current dict 'Outlook'
Examine 'Outlook', found 'Rain': 
'Rain' is a dict, make current dict the 'Rain' subdict
Current dict 'Temperature'
Current dict 'Humidity'
Current dict 'Wind'
Examine 'Wind', found 'Strong': 
'Strong' is not a dict, return No

根据值遍历决策树；迭代进入子词典？

Traverse decision tree based on values; iteratively going into sub-dictionaries?

python

dictionary

tree

decision-tree

tree-traversal