scipy 中树状图坐标与 ClusterNodes 之间的关系
Relation between dendrogram plot coordinates and ClusterNodes in scipy
我正在寻找一种获取 dendrogram plot based on its ClusterNode return by to_tree 中聚类点坐标的方法。
使用 scipy 从以下数据构建树状图:
X = data
Y = pdist(X)
Z = linkage(Y)
dend = dendrogram(Z)
rootnode, nodesList = to_tree(Z, rd=True)
我想做的是构建一个函数 get_coords(somClusterNode)
,它将 return 元组 (x, y)
指定图中节点的位置。
感谢this answer,我设法弄清楚如何从树状图return值中获取位置,例如:
i, d = list(zip(dend['icoord'], dend['dcoord']))[-1]
x = 0.5 * sum(i[1:3])
y = d[1]
plt.plot(x, y, 'ro')
但我可以找出节点列表排序和 icoord/dcoord
排序之间的关系,以便将一个映射到另一个。
你知道我可以在哪里寻找吗?
感谢您的帮助!
每个树状图仅映射到一棵 ClusterNodes 树,但任何一棵 ClusterNodes 树都可以映射到无限多的树状图。因此,从节点 ID 到 (x,y) 位置的映射可能应该只是树状图数据结构中的另一个字段,而不是 ClusterNode 的函数。因此,我没有定义函数 get_coords
,而是将字典附加到 dend
,将节点 ID 映射到 (x,y) 坐标。您可以使用
访问职位
x,y = dend['node_id_to_coord'][node_id] # node_id is an integer as returned by ClusterNode.id
代码:
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram, to_tree
from scipy.spatial.distance import pdist
# create some random data
X = np.random.rand(10, 3)
# get dendrogram
Z = linkage(pdist(X), method="ward")
dend = dendrogram(Z)
# ----------------------------------------
# get leave coordinates, which are at y == 0
def flatten(l):
return [item for sublist in l for item in sublist]
X = flatten(dend['icoord'])
Y = flatten(dend['dcoord'])
leave_coords = [(x,y) for x,y in zip(X,Y) if y==0]
# in the dendogram data structure,
# leave ids are listed in ascending order according to their x-coordinate
order = np.argsort([x for x,y in leave_coords])
id_to_coord = dict(zip(dend['leaves'], [leave_coords[idx] for idx in order])) # <- main data structure
# ----------------------------------------
# get coordinates of other nodes
# this should work but doesn't:
# # traverse tree from leaves upwards and populate mapping ID -> (x,y);
# # use linkage matrix to traverse the tree optimally
# # (each row in the linkage matrix corresponds to a row in dend['icoord'] and dend['dcoord'])
# root_node, node_list = to_tree(Z, rd=True)
# for ii, (X, Y) in enumerate(zip(dend['icoord'], dend['dcoord'])):
# x = (X[1] + X[2]) / 2
# y = Y[1] # or Y[2]
# node_id = ii + len(dend['leaves'])
# id_to_coord[node_id] = (x, y)
# so we need to do it the hard way:
# map endpoint of each link to coordinates of parent node
children_to_parent_coords = dict()
for i, d in zip(dend['icoord'], dend['dcoord']):
x = (i[1] + i[2]) / 2
y = d[1] # or d[2]
parent_coord = (x, y)
left_coord = (i[0], d[0])
right_coord = (i[-1], d[-1])
children_to_parent_coords[(left_coord, right_coord)] = parent_coord
# traverse tree from leaves upwards and populate mapping ID -> (x,y)
root_node, node_list = to_tree(Z, rd=True)
ids_left = range(len(dend['leaves']), len(node_list))
while len(ids_left) > 0:
for ii, node_id in enumerate(ids_left):
node = node_list[node_id]
if (node.left.id in id_to_coord) and (node.right.id in id_to_coord):
left_coord = id_to_coord[node.left.id]
right_coord = id_to_coord[node.right.id]
id_to_coord[node_id] = children_to_parent_coords[(left_coord, right_coord)]
ids_left = [node_id for node_id in range(len(node_list)) if not node_id in id_to_coord]
# plot result on top of dendrogram
ax = plt.gca()
for node_id, (x, y) in id_to_coord.iteritems():
if not node_list[node_id].is_leaf():
ax.plot(x, y, 'ro')
ax.annotate(str(node_id), (x, y), xytext=(0, -8),
textcoords='offset points',
va='top', ha='center')
dend['node_id_to_coord'] = id_to_coord
还有另一种方法:
树状图的 ID 似乎是由从右到左的反向遍历树生成的。这允许我们构建从 node.id
到 icoord
和 dcoord
的索引的转换,如下所示:
def rl_traversal(node):
# skipping leaves
if not node.is_leaf():
yield node.id
yield from rl_traversal(node.right)
yield from rl_traversal(node.left)
id_map = dict(zip( rl_traversal(root), reversed(range(root.get_count()-1))) ))
# id_map[node_id] = dendogram_id
节点的坐标可以通过dendo['icoord'][id_map[node_id]]
得到
我正在寻找一种获取 dendrogram plot based on its ClusterNode return by to_tree 中聚类点坐标的方法。
使用 scipy 从以下数据构建树状图:
X = data
Y = pdist(X)
Z = linkage(Y)
dend = dendrogram(Z)
rootnode, nodesList = to_tree(Z, rd=True)
我想做的是构建一个函数 get_coords(somClusterNode)
,它将 return 元组 (x, y)
指定图中节点的位置。
感谢this answer,我设法弄清楚如何从树状图return值中获取位置,例如:
i, d = list(zip(dend['icoord'], dend['dcoord']))[-1]
x = 0.5 * sum(i[1:3])
y = d[1]
plt.plot(x, y, 'ro')
但我可以找出节点列表排序和 icoord/dcoord
排序之间的关系,以便将一个映射到另一个。
你知道我可以在哪里寻找吗?
感谢您的帮助!
每个树状图仅映射到一棵 ClusterNodes 树,但任何一棵 ClusterNodes 树都可以映射到无限多的树状图。因此,从节点 ID 到 (x,y) 位置的映射可能应该只是树状图数据结构中的另一个字段,而不是 ClusterNode 的函数。因此,我没有定义函数 get_coords
,而是将字典附加到 dend
,将节点 ID 映射到 (x,y) 坐标。您可以使用
x,y = dend['node_id_to_coord'][node_id] # node_id is an integer as returned by ClusterNode.id
代码:
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram, to_tree
from scipy.spatial.distance import pdist
# create some random data
X = np.random.rand(10, 3)
# get dendrogram
Z = linkage(pdist(X), method="ward")
dend = dendrogram(Z)
# ----------------------------------------
# get leave coordinates, which are at y == 0
def flatten(l):
return [item for sublist in l for item in sublist]
X = flatten(dend['icoord'])
Y = flatten(dend['dcoord'])
leave_coords = [(x,y) for x,y in zip(X,Y) if y==0]
# in the dendogram data structure,
# leave ids are listed in ascending order according to their x-coordinate
order = np.argsort([x for x,y in leave_coords])
id_to_coord = dict(zip(dend['leaves'], [leave_coords[idx] for idx in order])) # <- main data structure
# ----------------------------------------
# get coordinates of other nodes
# this should work but doesn't:
# # traverse tree from leaves upwards and populate mapping ID -> (x,y);
# # use linkage matrix to traverse the tree optimally
# # (each row in the linkage matrix corresponds to a row in dend['icoord'] and dend['dcoord'])
# root_node, node_list = to_tree(Z, rd=True)
# for ii, (X, Y) in enumerate(zip(dend['icoord'], dend['dcoord'])):
# x = (X[1] + X[2]) / 2
# y = Y[1] # or Y[2]
# node_id = ii + len(dend['leaves'])
# id_to_coord[node_id] = (x, y)
# so we need to do it the hard way:
# map endpoint of each link to coordinates of parent node
children_to_parent_coords = dict()
for i, d in zip(dend['icoord'], dend['dcoord']):
x = (i[1] + i[2]) / 2
y = d[1] # or d[2]
parent_coord = (x, y)
left_coord = (i[0], d[0])
right_coord = (i[-1], d[-1])
children_to_parent_coords[(left_coord, right_coord)] = parent_coord
# traverse tree from leaves upwards and populate mapping ID -> (x,y)
root_node, node_list = to_tree(Z, rd=True)
ids_left = range(len(dend['leaves']), len(node_list))
while len(ids_left) > 0:
for ii, node_id in enumerate(ids_left):
node = node_list[node_id]
if (node.left.id in id_to_coord) and (node.right.id in id_to_coord):
left_coord = id_to_coord[node.left.id]
right_coord = id_to_coord[node.right.id]
id_to_coord[node_id] = children_to_parent_coords[(left_coord, right_coord)]
ids_left = [node_id for node_id in range(len(node_list)) if not node_id in id_to_coord]
# plot result on top of dendrogram
ax = plt.gca()
for node_id, (x, y) in id_to_coord.iteritems():
if not node_list[node_id].is_leaf():
ax.plot(x, y, 'ro')
ax.annotate(str(node_id), (x, y), xytext=(0, -8),
textcoords='offset points',
va='top', ha='center')
dend['node_id_to_coord'] = id_to_coord
还有另一种方法:
树状图的 ID 似乎是由从右到左的反向遍历树生成的。这允许我们构建从 node.id
到 icoord
和 dcoord
的索引的转换,如下所示:
def rl_traversal(node):
# skipping leaves
if not node.is_leaf():
yield node.id
yield from rl_traversal(node.right)
yield from rl_traversal(node.left)
id_map = dict(zip( rl_traversal(root), reversed(range(root.get_count()-1))) ))
# id_map[node_id] = dendogram_id
节点的坐标可以通过dendo['icoord'][id_map[node_id]]