如何通过缩进打印分割线中的线条?
How to print lines in splitlines by indentation?
我有字符串:
text = '''TextTextTextTextTextTextTextTextText1
TextTextTextTextTextTextTextTextText1
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4'''
我想通过缩进拆分这个字符串并将它们添加到列表中。这是我当前的代码:
nr_lines = 0
indent_dict = {}
for line in summary1.splitlines(True):
print(line)
print("------------------------------")
nr_lines+=1
whitespaces_count = len(line) - len(line.lstrip())
indent_dict[nr_lines] = whitespaces_count
print(indent_dict)
list_of_values = []
# Removed first key with value (indent) = 0
indent_dict_without = dict(indent_dict)
key = 1
del indent_dict_without[key]
# Adding values from dict to list
for key, value in indent_dict_without.items():
list_of_values.append(value)
print(list_of_values)
# Finding minimum value
x = min(list_of_values)
list_of_small = []
for nr in list_of_values:
if nr == x:
list_of_small.append(nr)
print(list_of_small)
# Finding which line have all smallest indent
n = 0
key_1 = []
for key, value in indent_dict.items():
if value == list_of_small[n]:
key_1.append(key)
print(key_1)
输出为:
{1: 0, 2: 12, 3: 8, 4: 12, 5: 12, 6: 12, 7: 12, 8: 8, 9: 12, 10: 12, 11: 8, 12: 12, 13: 12} # dict with line and value (indent)
[12, 8, 12, 12, 12, 12, 8, 12, 12, 8, 12, 12] # list with indents
[8, 8, 8] # the smallest indents
[3, 8, 11] # lines for smallest indents
现在,我不知道如何将这 4 个部分拆分并添加为列表的元素:
list = ['TextTextTextTextTextTextTextTextText1
TextTextTextTextTextTextTextTextText1',
'TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2',
'TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3',
'TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4']
我应该创建一个新变量并逐行添加直到新缩进吗?
如果我没理解错的话,您想根据缩进最小的行将文本拆分为段落。
我的处理方式如下。我会创建一个 defaultdict ,其中包含构成缩进的空格数作为键,作为值一个列表,其中包含具有此缩进计数的行的所有索引:
from collections import defaultdict
text = '''TextTextTextTextTextTextTextTextText1
TextTextTextTextTextTextTextTextText1
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4'''
def count_indentation(line):
return len(line) - len(line.lstrip())
lines = text.splitlines(keepends=False)
indent_dict = defaultdict(list)
for idx, line in enumerate(lines):
if count_indentation(line) > 0:
indent_dict[count_indentation(line)].append(idx)
现在 indent_dict
看起来像:
defaultdict(list, {8: [1, 3, 4, 5, 6, 8, 9, 11, 12], 4: [2, 7, 10]})
接下来,我们取最小的key,找到相关行的索引:
smallest_indent = min(indent_dict)
line_idexes_smallest_indents = indent_dict[smallest_indent]
line_idexes_smallest_indents
的结果是[2, 7, 10]
。索引是从零开始的,所以这就是为什么我的索引都比你的结果少一。现在我们需要根据这些索引对我们的原文进行分区。
def partition(lines, indices):
return [''.join(lines[i:j]) for i, j in zip([0]+indices, indices+[None])]
partition(lines, line_idexes_smallest_indents)
结果:
['TextTextTextTextTextTextTextTextText1 TextTextTextTextTextTextTextTextText1',
' TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2',
' TextTextTextTextTextTextTextTextText3 TextTextTextTextTextTextTextTextText3 TextTextTextTextTextTextTextTextText3',
' TextTextTextTextTextTextTextTextText4 TextTextTextTextTextTextTextTextText4 TextTextTextTextTextTextTextTextText4']
这是我想到的最快的。我敢肯定还有更优雅的解决方案
text = '''TextTextTextTextTextTextTextTextText1
TextTextTextTextTextTextTextTextText1
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4'''
lines = text.split('\n')
# Count spaces in each line
indent_lst = [line.count(' ') for line in text.splitlines(True)]
# Find where indentation changes
indices = []
for idx in range(len(indent_lst[1:])): # Start at second element in list
# Here I assume, that the indentation is constant. A change from more spaces to fewer spaces means,
# that a new block has started
if indent_lst[idx-1] > indent_lst[idx]: # Look back at previous element and compare with current
indices.append(idx)
final_lst = []
# Use slicing to append from block to block
for idx in range(len(indices)):
if indices.index(indices[idx]) == (len(indices) -1 ): # Take care of last block
final_lst.append(''.join(lines[indices[idx]:]))
else:
final_lst.append(''.join(lines[indices[idx]:indices[idx+1]])) # Add block to final list
print(final_lst)
结果如下:
['TextTextTextTextTextTextTextTextText1 TextTextTextTextTextTextTextTextText1', ' TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2', ' TextTextTextTextTextTextTextTextText3 TextTextTextTextTextTextTextTextText3 TextTextTextTextTextTextTextTextText3', ' TextTextTextTextTextTextTextTextText4 TextTextTextTextTextTextTextTextText4 TextTextTextTextTextTextTextTextText4']
我希望这已经对您有所帮助,如果您有任何问题,请随时提出!
我有字符串:
text = '''TextTextTextTextTextTextTextTextText1
TextTextTextTextTextTextTextTextText1
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4'''
我想通过缩进拆分这个字符串并将它们添加到列表中。这是我当前的代码:
nr_lines = 0
indent_dict = {}
for line in summary1.splitlines(True):
print(line)
print("------------------------------")
nr_lines+=1
whitespaces_count = len(line) - len(line.lstrip())
indent_dict[nr_lines] = whitespaces_count
print(indent_dict)
list_of_values = []
# Removed first key with value (indent) = 0
indent_dict_without = dict(indent_dict)
key = 1
del indent_dict_without[key]
# Adding values from dict to list
for key, value in indent_dict_without.items():
list_of_values.append(value)
print(list_of_values)
# Finding minimum value
x = min(list_of_values)
list_of_small = []
for nr in list_of_values:
if nr == x:
list_of_small.append(nr)
print(list_of_small)
# Finding which line have all smallest indent
n = 0
key_1 = []
for key, value in indent_dict.items():
if value == list_of_small[n]:
key_1.append(key)
print(key_1)
输出为:
{1: 0, 2: 12, 3: 8, 4: 12, 5: 12, 6: 12, 7: 12, 8: 8, 9: 12, 10: 12, 11: 8, 12: 12, 13: 12} # dict with line and value (indent)
[12, 8, 12, 12, 12, 12, 8, 12, 12, 8, 12, 12] # list with indents
[8, 8, 8] # the smallest indents
[3, 8, 11] # lines for smallest indents
现在,我不知道如何将这 4 个部分拆分并添加为列表的元素:
list = ['TextTextTextTextTextTextTextTextText1
TextTextTextTextTextTextTextTextText1',
'TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2',
'TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3',
'TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4']
我应该创建一个新变量并逐行添加直到新缩进吗?
如果我没理解错的话,您想根据缩进最小的行将文本拆分为段落。
我的处理方式如下。我会创建一个 defaultdict ,其中包含构成缩进的空格数作为键,作为值一个列表,其中包含具有此缩进计数的行的所有索引:
from collections import defaultdict
text = '''TextTextTextTextTextTextTextTextText1
TextTextTextTextTextTextTextTextText1
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4'''
def count_indentation(line):
return len(line) - len(line.lstrip())
lines = text.splitlines(keepends=False)
indent_dict = defaultdict(list)
for idx, line in enumerate(lines):
if count_indentation(line) > 0:
indent_dict[count_indentation(line)].append(idx)
现在 indent_dict
看起来像:
defaultdict(list, {8: [1, 3, 4, 5, 6, 8, 9, 11, 12], 4: [2, 7, 10]})
接下来,我们取最小的key,找到相关行的索引:
smallest_indent = min(indent_dict)
line_idexes_smallest_indents = indent_dict[smallest_indent]
line_idexes_smallest_indents
的结果是[2, 7, 10]
。索引是从零开始的,所以这就是为什么我的索引都比你的结果少一。现在我们需要根据这些索引对我们的原文进行分区。
def partition(lines, indices):
return [''.join(lines[i:j]) for i, j in zip([0]+indices, indices+[None])]
partition(lines, line_idexes_smallest_indents)
结果:
['TextTextTextTextTextTextTextTextText1 TextTextTextTextTextTextTextTextText1',
' TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2',
' TextTextTextTextTextTextTextTextText3 TextTextTextTextTextTextTextTextText3 TextTextTextTextTextTextTextTextText3',
' TextTextTextTextTextTextTextTextText4 TextTextTextTextTextTextTextTextText4 TextTextTextTextTextTextTextTextText4']
这是我想到的最快的。我敢肯定还有更优雅的解决方案
text = '''TextTextTextTextTextTextTextTextText1
TextTextTextTextTextTextTextTextText1
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText2
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText3
TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4
TextTextTextTextTextTextTextTextText4'''
lines = text.split('\n')
# Count spaces in each line
indent_lst = [line.count(' ') for line in text.splitlines(True)]
# Find where indentation changes
indices = []
for idx in range(len(indent_lst[1:])): # Start at second element in list
# Here I assume, that the indentation is constant. A change from more spaces to fewer spaces means,
# that a new block has started
if indent_lst[idx-1] > indent_lst[idx]: # Look back at previous element and compare with current
indices.append(idx)
final_lst = []
# Use slicing to append from block to block
for idx in range(len(indices)):
if indices.index(indices[idx]) == (len(indices) -1 ): # Take care of last block
final_lst.append(''.join(lines[indices[idx]:]))
else:
final_lst.append(''.join(lines[indices[idx]:indices[idx+1]])) # Add block to final list
print(final_lst)
结果如下:
['TextTextTextTextTextTextTextTextText1 TextTextTextTextTextTextTextTextText1', ' TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2 TextTextTextTextTextTextTextTextText2', ' TextTextTextTextTextTextTextTextText3 TextTextTextTextTextTextTextTextText3 TextTextTextTextTextTextTextTextText3', ' TextTextTextTextTextTextTextTextText4 TextTextTextTextTextTextTextTextText4 TextTextTextTextTextTextTextTextText4']
我希望这已经对您有所帮助,如果您有任何问题,请随时提出!