Python: 如何从给定文件中获取显示层次结构的名称?
Python: How to get a name showing the hierarchy from a given file?
我是 python 的初学者,我正在尝试使用它进行一些数据分析。我有一个如下所示的文本文件:
top1
top1_a
top1_b
top1_c
top1_c_a
top1_c_a_a
top1_c_a_b
top1_d
top2
top2_a
top2_b
top2_b_a
...
总之,我想要做的是获得一个显示层次结构的名称。例如,top1_c_a 应命名为 'top1/top1_c/top_c_a'。最后,我想得到一个包含这些名字的列表。我该怎么办?
您可以通过递归根据缩进深度对文件行进行分组:
import re
with open('hierarchy_data.txt') as f:
d = [(j:=(k[0] if (k:=re.findall('^\s+', i)) else ''), i[len(j):].strip('\n')) for i in f]
def full_paths(d, p = []):
if not d:
yield '/'.join(p)
else:
k, r = None, []
for a, b in d:
if not a:
if k is not None:
yield from full_paths(r, p+[k])
k, r = b, []
else:
r.append((a[2:], b))
if k is not None:
yield from full_paths(r, p+[k])
print(list(full_paths(d)))
输出:
['top1/top1_a', 'top1/top1_b', 'top1/top1_c/top1_c_a/top1_c_a_a', 'top1/top1_c/top1_c_a/top1_c_a_b', 'top1/top1_d', 'top2/top2_a', 'top2/top2_b/top2_b_a']
我是 python 的初学者,我正在尝试使用它进行一些数据分析。我有一个如下所示的文本文件:
top1
top1_a
top1_b
top1_c
top1_c_a
top1_c_a_a
top1_c_a_b
top1_d
top2
top2_a
top2_b
top2_b_a
...
您可以通过递归根据缩进深度对文件行进行分组:
import re
with open('hierarchy_data.txt') as f:
d = [(j:=(k[0] if (k:=re.findall('^\s+', i)) else ''), i[len(j):].strip('\n')) for i in f]
def full_paths(d, p = []):
if not d:
yield '/'.join(p)
else:
k, r = None, []
for a, b in d:
if not a:
if k is not None:
yield from full_paths(r, p+[k])
k, r = b, []
else:
r.append((a[2:], b))
if k is not None:
yield from full_paths(r, p+[k])
print(list(full_paths(d)))
输出:
['top1/top1_a', 'top1/top1_b', 'top1/top1_c/top1_c_a/top1_c_a_a', 'top1/top1_c/top1_c_a/top1_c_a_b', 'top1/top1_d', 'top2/top2_a', 'top2/top2_b/top2_b_a']