如何恢复字典中 Newick 树的内部节点?
How to recover the internal nodes of a Newick tree in a dictionary?
我有一棵以下的 Newick 树:
((((A,B)1,C)2,((((D,E)3,F)4,G)5,(((((H,I)6,J)7,K)8 ,L)9,M)10)11)12,N)13;"
其中字母是叶子,数字是内部节点。
我想获取以下词典
{1:('A','B'),2:('C',1),3:('D','E'), 4:('F',3) ....}
它将内部节点与其两个子节点相关联。
我在 Whosebug 上找到这段代码:
import re
def parse(newick):
tokens = re.findall(r"([^;,()\s]*)(?:\s*\s*([\d.]+)\s*)?([,);])|(\S)", newick)
def recurse():
children = []
name, length, delim, ch = tokens.pop(0)
if ch == "(":
while ch in "(,":
node, ch = recurse()
children.append(node)
name, length, delim, ch = tokens.pop(0)
return {"name": name,"children": children}, delim
return recurse()[0]
但我不知道如何让它适应这个问题。
谢谢,
换行
return {"name": name,"children": children}, delim
刚刚
return {name: children}, delim
定义新函数:
def alternative_newick(treedata, zero_root=False):
result = {}
def build_node(parent, name, children):
if parent == "":
parent = 0
if name.isnumeric():
name = int(name)
for child in children:
if child:
build_node(name, *child.popitem())
if result.get(parent):
result[parent].append(name)
else:
result[parent] = [name]
tree = parse(treedata)
build_node(None, *tree.popitem())
result.pop(None)
if not zero_root:
result.pop(0)
for k, v in result.items():
result[k] = tuple(v)
return result
并使用喜欢
treedata = "(((A,B)1,C)2,((((D,E)3,F)4,G)5,(((((H,I)6,J)7,K)8,L)9,M)10)11)12"
print(alternative_newick(treedata))
结果会是
{1: ('A', 'B'), 2: (1, 'C'), 3: ('D', 'E'), 4: (3, 'F'), 5: (4, 'G'), 11: (5, 10), 6: ('H', 'I'), 7: (6, 'J'), 8: (7, 'K'), 9: (8, 'L'), 10: (9, 'M')}
整个代码here
我有一棵以下的 Newick 树: ((((A,B)1,C)2,((((D,E)3,F)4,G)5,(((((H,I)6,J)7,K)8 ,L)9,M)10)11)12,N)13;"
其中字母是叶子,数字是内部节点。
我想获取以下词典
{1:('A','B'),2:('C',1),3:('D','E'), 4:('F',3) ....}
它将内部节点与其两个子节点相关联。
我在 Whosebug 上找到这段代码:
import re
def parse(newick):
tokens = re.findall(r"([^;,()\s]*)(?:\s*\s*([\d.]+)\s*)?([,);])|(\S)", newick)
def recurse():
children = []
name, length, delim, ch = tokens.pop(0)
if ch == "(":
while ch in "(,":
node, ch = recurse()
children.append(node)
name, length, delim, ch = tokens.pop(0)
return {"name": name,"children": children}, delim
return recurse()[0]
但我不知道如何让它适应这个问题。
谢谢,
换行
return {"name": name,"children": children}, delim
刚刚
return {name: children}, delim
定义新函数:
def alternative_newick(treedata, zero_root=False):
result = {}
def build_node(parent, name, children):
if parent == "":
parent = 0
if name.isnumeric():
name = int(name)
for child in children:
if child:
build_node(name, *child.popitem())
if result.get(parent):
result[parent].append(name)
else:
result[parent] = [name]
tree = parse(treedata)
build_node(None, *tree.popitem())
result.pop(None)
if not zero_root:
result.pop(0)
for k, v in result.items():
result[k] = tuple(v)
return result
并使用喜欢
treedata = "(((A,B)1,C)2,((((D,E)3,F)4,G)5,(((((H,I)6,J)7,K)8,L)9,M)10)11)12"
print(alternative_newick(treedata))
结果会是
{1: ('A', 'B'), 2: (1, 'C'), 3: ('D', 'E'), 4: (3, 'F'), 5: (4, 'G'), 11: (5, 10), 6: ('H', 'I'), 7: (6, 'J'), 8: (7, 'K'), 9: (8, 'L'), 10: (9, 'M')}
整个代码here