从当前元素位置开始嵌套循环到列表末尾
Starting nested loop from current element position to the end of the list
我有一个结构如下的文本文件:
name1:
sentence. [sentence. ...] # can be one or more
name2:
sentence. [sentence. ...]
编辑 输入样本:
Djohn:
Hello. I am Djohn
I am Djohn.
Bot:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
est laborum.
Ninja:
Hey guys!! wozzup
编辑 2 输入样本:
This is example sentence that can come before first speaker.
Djohn:
Hello. I am Djohn
I am Djohn.
Bot:
Yes, I understand, don't say it twice lol
Ninja:
Hey guys!! wozzup
每一项(名称或句子)是一个Unicode字符串。我将这些数据放入列表中,并希望形成一个字典:
{
'name1': [[sentence.], ..]
'name2': [[sentence.], ..]
}
编辑 3
The dictionary I am building intended to be written into a file and it is bunch of Unicode strings.
我想做的是:
for i, paragraph in enumerate(paragraphs): # paragraphs is the list
# with Unicode strings
if isParagraphEndsWithColon(paragraph):
name = paragraph
text = []
for p in range(paragraphs[i], paragraphs[-1]):
if isParagraphEndsWithColon(p):
break
localtext.extend(p)
# this is output dictionary I am trying to build
outputDocumentData[name].extend(text)
例如我需要从找到的 'name:' 句子到下一个句子进行嵌套循环,同时扩展相同键(名称)的句子列表。
问题是 range() 在这里对我不起作用,因为它需要整数。
正在寻找 "pythonic" 方法来创建从当前元素到列表末尾的嵌套循环。 (感觉每次迭代都对列表进行切片会效率低下)
您可以使用 groupby:
from itertools import groupby
lines = ["Djohn:",
"Hello. I am Djohn",
"I am Djohn.",
"Bot:",
"Yes, I understand, don't say it twice lol",
"Ninja:",
"Hey guys!! wozzup"]
name = ''
result = {}
for k, v in groupby(lines, key= lambda x: x.endswith(':')):
if k:
name = ''.join(v).lstrip(':')
else:
result.setdefault(name, []).extend(list(v))
print(result)
输出
{'Djohn:': ['Hello. I am Djohn', 'I am Djohn.'], 'Ninja:': ['Hey guys!! wozzup'], 'Bot:': ["Yes, I understand, don't say it twice lol"]}
想法是将输入分组到名称行,而不是名称行,因此您可以用作 key lambda x: x.endswith(':')
。
我有一个结构如下的文本文件:
name1:
sentence. [sentence. ...] # can be one or more
name2:
sentence. [sentence. ...]
编辑 输入样本:
Djohn:
Hello. I am Djohn
I am Djohn.
Bot:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
est laborum.
Ninja:
Hey guys!! wozzup
编辑 2 输入样本:
This is example sentence that can come before first speaker.
Djohn:
Hello. I am Djohn
I am Djohn.
Bot:
Yes, I understand, don't say it twice lol
Ninja:
Hey guys!! wozzup
每一项(名称或句子)是一个Unicode字符串。我将这些数据放入列表中,并希望形成一个字典:
{
'name1': [[sentence.], ..]
'name2': [[sentence.], ..]
}
编辑 3
The dictionary I am building intended to be written into a file and it is bunch of Unicode strings.
我想做的是:
for i, paragraph in enumerate(paragraphs): # paragraphs is the list
# with Unicode strings
if isParagraphEndsWithColon(paragraph):
name = paragraph
text = []
for p in range(paragraphs[i], paragraphs[-1]):
if isParagraphEndsWithColon(p):
break
localtext.extend(p)
# this is output dictionary I am trying to build
outputDocumentData[name].extend(text)
例如我需要从找到的 'name:' 句子到下一个句子进行嵌套循环,同时扩展相同键(名称)的句子列表。 问题是 range() 在这里对我不起作用,因为它需要整数。
正在寻找 "pythonic" 方法来创建从当前元素到列表末尾的嵌套循环。 (感觉每次迭代都对列表进行切片会效率低下)
您可以使用 groupby:
from itertools import groupby
lines = ["Djohn:",
"Hello. I am Djohn",
"I am Djohn.",
"Bot:",
"Yes, I understand, don't say it twice lol",
"Ninja:",
"Hey guys!! wozzup"]
name = ''
result = {}
for k, v in groupby(lines, key= lambda x: x.endswith(':')):
if k:
name = ''.join(v).lstrip(':')
else:
result.setdefault(name, []).extend(list(v))
print(result)
输出
{'Djohn:': ['Hello. I am Djohn', 'I am Djohn.'], 'Ninja:': ['Hey guys!! wozzup'], 'Bot:': ["Yes, I understand, don't say it twice lol"]}
想法是将输入分组到名称行,而不是名称行,因此您可以用作 key lambda x: x.endswith(':')
。