基于阈值的聚类
clustering based on thresholds
编辑(简体)
我很确定,我错过了 "google" 这个问题的正确术语,如果之前有人问过,请指点一下:
我有一个树结构,可以说如下
(0)->(0,0:7)
(0,1:9)
(1)->(1,0:6)
(1,1:2)
(1,2:1)
为简单起见,让我们将其转换为平面结构
l1, l2, v1
0, 0, 7
0, 1, 9
1, 0, 6
1, 1, 2
1, 2, 1
现在让我们在这棵树上设置 3
的阈值。这意味着我们要保留高于阈值的节点,并合并分支中低于阈值的所有节点。
所以我们最终得到的是最后两行(因为它们低于阈值)以注释 'yielded' 作为两个的总和结束:
l1, l2, v1
0, 0, 7
0, 1, 9
1, 0, 6
1, (1,2), 3
理想情况下,python 中的解决方案将不胜感激。显然会有我很乐意处理的边缘条件。请注意,实际上我最终可以得到 6 深的树。
所以我终于按照我之前暗示过的繁琐方式完成了。
我开始向我的节点定义添加一些标志(dodelete=False
和 visited=False
)。
并将 add_node
方法更新为
def add_child(self, node):
node.parent = self
node.level = self.level + 1
self.children.append(node)
return node
其中 self.children
是节点列表
然后两种方法
def collapse_nodes(tree, thresh=3):
for n in tree:
if n.dodelete:
continue
sub_tree = tree.get_by_path (n.id)
sub_tree_stack = []
for child in sub_tree.children:
if child.val is not None and thresh > child.val:
sub_tree_stack.append(child)
tree.get_by_path(child.id).dodelete = True
if sub_tree_stack:
sub_tree.add_child(Node(",".join([subnode.name for subnode in sub_tree_stack]),
val = sum([subnode.val for subnode in sub_tree_stack]),
id= sorted([subnode.id for subnode in sub_tree_stack])[0]))
return tree
和
def roll_up(tree, thresh = 2, level=5):
for n in tree:
if n.dodelete or n.visited:
continue
if n.level != level:
continue
sub_tree = tree.get_by_path(n.id)
if sub_tree is None:
continue
sub_tree_stack = []
for child in tree.get_by_path(n.id).children:
if child.val is not None and child.val <= thresh:
sub_tree_stack.append((child.name, child.id, child.val))
# Also mark this for deletion
tree.get_by_path(child.id).dodelete = True
if sub_tree_stack:
# Get the parent for these nodes
node_name = n.name + ": [" + ",".join([subnode[0] for subnode in sub_tree_stack]) + "]"
node_val = sum([subnode[2]for subnode in sub_tree_stack])
node_id = sorted([subnode[1] for subnode in sub_tree_stack])[0]
parent_name = n.parent.name
parent_level = n.parent.level
parent_id= n.parent.id
# Now ensure that you delete the old node before adding new
tree.get_by_path(n.id).dodelete = True
tree.get_by_path(parent_id).add_child(Node(node_name, val=node_val, id = n.id, visited=True) )
return tree
这样做有点复杂,但是有效。我已经为任何试图测试它的任性灵魂创建了一个要点 https://gist.github.com/fahaddaniyal/0dc86c80f266fd9f8cdb
编辑(简体)
我很确定,我错过了 "google" 这个问题的正确术语,如果之前有人问过,请指点一下: 我有一个树结构,可以说如下
(0)->(0,0:7)
(0,1:9)
(1)->(1,0:6)
(1,1:2)
(1,2:1)
为简单起见,让我们将其转换为平面结构
l1, l2, v1
0, 0, 7
0, 1, 9
1, 0, 6
1, 1, 2
1, 2, 1
现在让我们在这棵树上设置 3
的阈值。这意味着我们要保留高于阈值的节点,并合并分支中低于阈值的所有节点。
所以我们最终得到的是最后两行(因为它们低于阈值)以注释 'yielded' 作为两个的总和结束:
l1, l2, v1
0, 0, 7
0, 1, 9
1, 0, 6
1, (1,2), 3
理想情况下,python 中的解决方案将不胜感激。显然会有我很乐意处理的边缘条件。请注意,实际上我最终可以得到 6 深的树。
所以我终于按照我之前暗示过的繁琐方式完成了。
我开始向我的节点定义添加一些标志(dodelete=False
和 visited=False
)。
并将 add_node
方法更新为
def add_child(self, node):
node.parent = self
node.level = self.level + 1
self.children.append(node)
return node
其中 self.children
是节点列表
然后两种方法
def collapse_nodes(tree, thresh=3):
for n in tree:
if n.dodelete:
continue
sub_tree = tree.get_by_path (n.id)
sub_tree_stack = []
for child in sub_tree.children:
if child.val is not None and thresh > child.val:
sub_tree_stack.append(child)
tree.get_by_path(child.id).dodelete = True
if sub_tree_stack:
sub_tree.add_child(Node(",".join([subnode.name for subnode in sub_tree_stack]),
val = sum([subnode.val for subnode in sub_tree_stack]),
id= sorted([subnode.id for subnode in sub_tree_stack])[0]))
return tree
和
def roll_up(tree, thresh = 2, level=5):
for n in tree:
if n.dodelete or n.visited:
continue
if n.level != level:
continue
sub_tree = tree.get_by_path(n.id)
if sub_tree is None:
continue
sub_tree_stack = []
for child in tree.get_by_path(n.id).children:
if child.val is not None and child.val <= thresh:
sub_tree_stack.append((child.name, child.id, child.val))
# Also mark this for deletion
tree.get_by_path(child.id).dodelete = True
if sub_tree_stack:
# Get the parent for these nodes
node_name = n.name + ": [" + ",".join([subnode[0] for subnode in sub_tree_stack]) + "]"
node_val = sum([subnode[2]for subnode in sub_tree_stack])
node_id = sorted([subnode[1] for subnode in sub_tree_stack])[0]
parent_name = n.parent.name
parent_level = n.parent.level
parent_id= n.parent.id
# Now ensure that you delete the old node before adding new
tree.get_by_path(n.id).dodelete = True
tree.get_by_path(parent_id).add_child(Node(node_name, val=node_val, id = n.id, visited=True) )
return tree
这样做有点复杂,但是有效。我已经为任何试图测试它的任性灵魂创建了一个要点 https://gist.github.com/fahaddaniyal/0dc86c80f266fd9f8cdb