解析一棵二叉决策树,通过反转决策树向右遍历的规则找到所有路径

parse a binary decision tree and find all path by inversing the rule of the decision tree traversing to the right

我现在拥有的决策树规则

{
  "id": -1,
  "rule": "TOTAL_REVENUE <= 300",
  "left": {
        "id": "0",
        "rule": "TOTAL_DATA_DUR <= 39.5794",
        
        "left": {
          "id": "1",
          "rule": "TOTAL_DATA_DUR <= 0.7408",
         
          "left": null,
          "right": {
            "id": "3",
            "rule": "TOTAL_PACKAGE_REVENUE <= 15.1350",
           
            "left": {
              "id": "4",
              "rule": "TOTAL_PACKAGE_REVENUE_14DAYS <= 12.5000",
             
              "left": null,
              "right": {
                "id": "6",
              
                "value": 84.62
                
              }
            },
            "right": null
          }
        },
        "right": {
          "id": "8",
          "rule": "TOTAL_DATA_DUR <= 301.6211",
        
          "left": null,
          "right": {
            "id": "10",
            "rule": "TOTAL_DATA_DUR <= 6898.9146",
           
            "left": {
              "id": "11",
              "rule": "TOTAL_PACKAGE_REVENUE <= 14.5000",
             
              "left": null,
              "right": {
                "id": "13",
                "rule": "TOTAL_PACKAGE_REVENUE <= 16.0000",
             
                "left": {
                  "id": "14",
                 
                 
                  "value": 84.96
                  
                },
                "right": {
                  "id": "15",
                  "rule": "TOTAL_PACKAGE_REVENUE <= 19.5000",
                  "left": null,
                  "right": {
                    "id": "17",
                   
                   
                    "value": 70.8
                    
                  }
                }
              }
            },
            "right": null
          }
        }
      }
}

我想要得到的输出是

path1 -> TOTAL_REVENUE <= 300 and TOTAL_DATA_DUR <= 39.5794 TOTAL_DATA_DUR > 0.7408 and TOTAL_PACKAGE_REVENUE <= 15.1350 and TOTAL_PACKAGE_REVENUE_14DAYS > 12.5000

在这里你可以看到 TOTAL_DATA_DUR > 0.7408 和 TOTAL_PACKAGE_REVENUE_14DAYS > 12.5000 被反转,因为它穿过右边,其余条件是 <= 因为它穿过左边

path2 -> TOTAL_REVENUE <= 300 and TOTAL_DATA_DUR > 39.5794 and TOTAL_DATA_DUR >  301.6211 and TOTAL_DATA_DUR <= 6898.9146 and TOTAL_PACKAGE_REVENUE > 14.5000 and TOTAL_PACKAGE_REVENUE <= 16.0000

path3 -> TOTAL_REVENUE <= 300 and TOTAL_DATA_DUR > 39.5794 and TOTAL_DATA_DUR >  301.6211 and TOTAL_DATA_DUR <= 6898.9146 and TOTAL_PACKAGE_REVENUE > 14.5000 and TOTAL_PACKAGE_REVENUE > 16.0000 and TOTAL_PACKAGE_REVENUE > 19.5000

我对编码还很陌生 我将如何使用递归获得所需的输出

我现在正在处理的代码

from collections import deque
import json
def isLeaf1(node):
    return node.get('left') is None and node.get('right') is None



all_paths = []


def printRootToLeafPaths1(node, path, node_type=None):
    # base case
    if node is None:

        return

    # include the current node to the path
    if node.get('rule') is not None:
        path.append(node.get('rule'))

    # if a leaf node is found, print the path
    if isLeaf1(node):
        all_paths.append(list(path))


    # recur for the left and right subtree
    printRootToLeafPaths1(node.get('left'), path, 'left')
    printRootToLeafPaths1(node.get('right'), path, 'right')

    # backtrack: remove the current node after the left, and right subtree are done
    path.pop()



# The main function to print paths from the root node to every leaf node
def printRootToLeafPath1(root):
    # list to store root-to-leaf path
    path = deque()
    printRootToLeafPaths1(root, path)

json_rule ='{"id":-1,"rule":"TOTAL_REVENUE <= 300","left":{"id":"0","rule":"TOTAL_DATA_DUR <= 39.5794","left":{"id":"1","rule":"TOTAL_DATA_DUR <= 0.7408","left":null,"right":{"id":"3","rule":"TOTAL_PACKAGE_REVENUE <= 15.1350","left":{"id":"4","rule":"TOTAL_PACKAGE_REVENUE_14DAYS <= 12.5000","left":null,"right":{"id":"6","value":84.62}},"right":null}},"right":{"id":"8","rule":"TOTAL_DATA_DUR <= 301.6211","left":null,"right":{"id":"10","rule":"TOTAL_DATA_DUR <= 6898.9146","left":{"id":"11","rule":"TOTAL_PACKAGE_REVENUE <= 14.5000","left":null,"right":{"id":"13","rule":"TOTAL_PACKAGE_REVENUE <= 16.0000","left":{"id":"14","value":84.96},"right":{"id":"15","rule":"TOTAL_PACKAGE_REVENUE <= 19.5000","impurity":"0.47643265235457066","samples":"304","left":null,"right":{"id":"17","value":70.8}}}},"right":null}}}}'

printRootToLeafPath1(json.loads(json_rule))

任何人都可以告诉我应该在此代码中进行哪些更改才能获得输出路径吗?

一些备注:

  • 而不是在递归函数中打印yield。这样,调用者可以决定他们对路径做什么:打印它们,或其他。

  • 使用全局变量并让函数改变该变量是不好的做法all_paths。相反,旨在使函数“纯”,因此它不需要它并且可以通过产生它们(或返回它们)来产生路径。

  • pop()调用是无条件的,而append()调用是有条件的。这看起来有可能在未附加在同一函数中时弹出某些内容,因此很容易出错。有更好的递归方法——下一点:

  • 不是将部分路径传递给递归调用,而是让递归调用给出它找到的路径,“就好像”它是根,然后 prefix 当前节点对它的规则。这是递归的更好做法。

  • 显然,当节点中没有“规则”键时,它是一个leaf-node。在那种情况下,不再需要寻找 left/right,因为据了解,节点具有 或者 一个“规则”键和子节点,或者 没有“规则”键,也没有子项。在叶子的情况下,只产生调用者可以扩展的空路径。

  • 您的代码中没有否定规则的逻辑。为此,您可以使用将每个运算符映射到其对立运算符的对列表。如果其中 none 符合规则,则只需应用默认的 not (rule) 格式。此逻辑假定规则仅使用一个运算符。

  • 您的算法没有使用树叶中的“值”属性。我认为您在某些时候需要使用该信息,但我暂时忽略它。

考虑到以上几点,这是一个可能的实现:

def negate(rule):
    mapper = (("<=", ">"), (">=", "<"), (">", "<="), ("<", ">="))
    for operator, negated in mapper:
        if operator in rule:
            return rule.replace(operator, negated)
    return "not (" + rule + ")"  # Default when operator not recognised

def nodeToLeafPaths(node):
    if not node:  # base case: nothing in this direction
        return
    rule = node.get('rule')
    if rule is None:  # base case: a leaf with a value
        yield []  # empty path
        return

    negated_rule = negate(rule)
    for path in nodeToLeafPaths(node.get('left')):
        yield [rule, *path]  # Extend path with current rule
    for path in nodeToLeafPaths(node.get('right')):
        yield [negated_rule, *path]

# Transform paths (lists) to AND-rules (strings):
def rootToLeafConjugations(root):
    return [" AND ".join(path) for path in nodeToLeafPaths(root)]

主要驱动程序代码可能如下所示:

import json

json_rule ='{"id":-1,"rule":"TOTAL_REVENUE <= 300","left":{"id":"0","rule":"TOTAL_DATA_DUR <= 39.5794","left":{"id":"1","rule":"TOTAL_DATA_DUR <= 0.7408","left":null,"right":{"id":"3","rule":"TOTAL_PACKAGE_REVENUE <= 15.1350","left":{"id":"4","rule":"TOTAL_PACKAGE_REVENUE_14DAYS <= 12.5000","left":null,"right":{"id":"6","value":84.62}},"right":null}},"right":{"id":"8","rule":"TOTAL_DATA_DUR <= 301.6211","left":null,"right":{"id":"10","rule":"TOTAL_DATA_DUR <= 6898.9146","left":{"id":"11","rule":"TOTAL_PACKAGE_REVENUE <= 14.5000","left":null,"right":{"id":"13","rule":"TOTAL_PACKAGE_REVENUE <= 16.0000","left":{"id":"14","value":84.96},"right":{"id":"15","rule":"TOTAL_PACKAGE_REVENUE <= 19.5000","impurity":"0.47643265235457066","samples":"304","left":null,"right":{"id":"17","value":70.8}}}},"right":null}}}}'

for rule in rootToLeafConjugations(json.loads(json_rule)):
    print(rule)