如何从树状文件目录文本文件创建嵌套字典对象?
How can I create a nested dictionary object from tree-like file-directory text-file?
我有一个由制表符和线条分隔的树结构,如下所示:
a
\t1
\t2
\t3
\t\tb
\t\tc
\t4
\t5
And I am looking to turn this into:
{
'name': 'a',
'children': [
{'name': '1'},
{'name': '2'},
{
'name': '3'
'children': [
{'name': 'b'},
{'name': 'c'}
]
},
{'name': '4'},
{'name': '5'}
]
}
用于 d3.js 可折叠树数据输入。我假设我必须以某种方式使用递归,但我不知道如何使用。
我试过将输入变成这样的列表:
[('a',0), ('1',1), ('2',1), ('3',1), ('b',2), ('c',2), ('4',1), ('5',1)]
使用此代码:
def parser():
#run from root `retail-tree`: `python3 src/main.py`
l, all_line_details = list(), list()
with open('assets/retail') as f:
for line in f:
line = line.rstrip('\n ')
splitline = line.split(' ')
tup = (splitline[-1], len(splitline)-1)
l.append(splitline)
all_line_details.append(tup)
print(tup)
return all_line_details
这里,第一个元素是字符串本身,第二个元素是该行中的制表符数。不确定完成此操作的递归步骤。感谢任何帮助!
您可以使用一个函数,该函数使用 re.findall
的正则表达式匹配一行作为节点名称,后跟以制表符开头的 0 行或多行,分组为子项,然后从子字符串中剥离每行的第一个制表符后,递归地为子字符串构建相同的结构:
import re
def parser(s):
output = []
for name, children in re.findall(r'(.*)\n((?:\t.*\n)*)', s):
node = {'name': name}
if children:
node.update({'children': parser(''.join(line[1:] for line in children.splitlines(True)))})
output.append(node)
return output
因此给定:
s = '''a
\t1
\t2
\t3
\t\tb
\t\tc
\t4
\t5
'''
parser(s)[0]
returns:
{'name': 'a',
'children': [{'name': '1'},
{'name': '2'},
{'name': '3', 'children': [{'name': 'b'}, {'name': 'c'}]},
{'name': '4'},
{'name': '5'}]}
根据您自己的 parser
函数提供的列表结构工作:
def make_tree(lines, tab_count=0):
tree = []
index = 0
while index < len(lines):
if lines[index][1] == tab_count:
node = {"name": lines[index][0]}
children, lines_read = make_tree(lines[index + 1:], tab_count + 1)
if children:
node["children"] = children
index += lines_read
tree.append(node)
else:
break
index += 1
return tree, index
测试用例:
lines = [("a", 0), ("1", 1), ("2", 1), ("3", 1), ("b", 2), ("c", 2), ("4", 1), ("5", 1)]
test_1 = make_tree([("a", 0)])
assert test_1[0] == [{"name": "a"}], test_1
test_2 = make_tree([("a", 0), ("b", 1)])
assert test_2[0] == [{"name": "a", "children": [{"name": "b"}]}], test_2
test_3 = make_tree(lines)
expected_3 = [
{
"name": "a",
"children": [
{"name": "1"},
{"name": "2"},
{"name": "3", "children": [{"name": "b"}, {"name": "c"}]},
{"name": "4"},
{"name": "5"},
],
}
]
assert test_3[0] == expected_3, test_3
请注意,如果您的源文件有多个根节点(即多行没有前导制表符),并且为了递归的整洁,输出会包含在一个列表中。
我有一个由制表符和线条分隔的树结构,如下所示:
a
\t1
\t2
\t3
\t\tb
\t\tc
\t4
\t5
And I am looking to turn this into:
{
'name': 'a',
'children': [
{'name': '1'},
{'name': '2'},
{
'name': '3'
'children': [
{'name': 'b'},
{'name': 'c'}
]
},
{'name': '4'},
{'name': '5'}
]
}
用于 d3.js 可折叠树数据输入。我假设我必须以某种方式使用递归,但我不知道如何使用。
我试过将输入变成这样的列表:
[('a',0), ('1',1), ('2',1), ('3',1), ('b',2), ('c',2), ('4',1), ('5',1)]
使用此代码:
def parser():
#run from root `retail-tree`: `python3 src/main.py`
l, all_line_details = list(), list()
with open('assets/retail') as f:
for line in f:
line = line.rstrip('\n ')
splitline = line.split(' ')
tup = (splitline[-1], len(splitline)-1)
l.append(splitline)
all_line_details.append(tup)
print(tup)
return all_line_details
这里,第一个元素是字符串本身,第二个元素是该行中的制表符数。不确定完成此操作的递归步骤。感谢任何帮助!
您可以使用一个函数,该函数使用 re.findall
的正则表达式匹配一行作为节点名称,后跟以制表符开头的 0 行或多行,分组为子项,然后从子字符串中剥离每行的第一个制表符后,递归地为子字符串构建相同的结构:
import re
def parser(s):
output = []
for name, children in re.findall(r'(.*)\n((?:\t.*\n)*)', s):
node = {'name': name}
if children:
node.update({'children': parser(''.join(line[1:] for line in children.splitlines(True)))})
output.append(node)
return output
因此给定:
s = '''a
\t1
\t2
\t3
\t\tb
\t\tc
\t4
\t5
'''
parser(s)[0]
returns:
{'name': 'a',
'children': [{'name': '1'},
{'name': '2'},
{'name': '3', 'children': [{'name': 'b'}, {'name': 'c'}]},
{'name': '4'},
{'name': '5'}]}
根据您自己的 parser
函数提供的列表结构工作:
def make_tree(lines, tab_count=0):
tree = []
index = 0
while index < len(lines):
if lines[index][1] == tab_count:
node = {"name": lines[index][0]}
children, lines_read = make_tree(lines[index + 1:], tab_count + 1)
if children:
node["children"] = children
index += lines_read
tree.append(node)
else:
break
index += 1
return tree, index
测试用例:
lines = [("a", 0), ("1", 1), ("2", 1), ("3", 1), ("b", 2), ("c", 2), ("4", 1), ("5", 1)]
test_1 = make_tree([("a", 0)])
assert test_1[0] == [{"name": "a"}], test_1
test_2 = make_tree([("a", 0), ("b", 1)])
assert test_2[0] == [{"name": "a", "children": [{"name": "b"}]}], test_2
test_3 = make_tree(lines)
expected_3 = [
{
"name": "a",
"children": [
{"name": "1"},
{"name": "2"},
{"name": "3", "children": [{"name": "b"}, {"name": "c"}]},
{"name": "4"},
{"name": "5"},
],
}
]
assert test_3[0] == expected_3, test_3
请注意,如果您的源文件有多个根节点(即多行没有前导制表符),并且为了递归的整洁,输出会包含在一个列表中。