向空嵌套列表添加新值
Adding new values to empty nested lists
这与 How to append to the end of an empty list? 有关,但我还没有足够的声誉在那里发表评论,所以我在这里发布了一个新问题。
我需要将术语附加到一个空的列表列表中。我开始于:
Talks[eachFilename][TermVectors]=
[['paragraph','1','text'],
['paragraph','2','text'],
['paragraph','3','text']]
我想以
结尾
Talks[eachFilename][SomeTermsRemoved]=
[['paragraph','text'],
['paragraph','2'],
['paragraph']]
Talks[eachFilename][SomeTermsRemoved]
开始为空。我无法指定我想要的:
Talks[eachFilename][SomeTermsRemoved][0][0]='paragraph'
Talks[eachFilename][SomeTermsRemoved][0][1]='text'
Talks[eachFilename][SomeTermsRemoved][1][0]='paragraph'
等...(IndexError:列表索引超出范围)。如果我强制填充字符串然后尝试更改它,我会得到一个字符串是不可变的错误。
那么,我要如何指定 Talks[eachFilename][SomeTermsRemoved][0]
为 ['paragraph','text']
,Talks[eachFilename][SomeTermsRemoved][1]
为 ['paragraph','2']
等等?
.append
有效,但只生成一个长列,而不是一组列表。
更具体地说,我有许多在字典中初始化的列表
Talks = {}
Talks[eachFilename]= {}
Talks[eachFilename]['StartingText']=[]
Talks[eachFilename]['TermVectors']=[]
Talks[eachFilename]['TermVectorsNoStops']=[]
eachFilename
从文本文件列表中填充,例如:
Talks[eachFilename]=['filename1','filename2']
StartingText
有好几行长文字(个别段落)
Talks[filename1][StartingText]=['This is paragraph one','paragraph two']
TermVectors 由带有术语列表的 NLTK 包填充,仍然分组在原始段落中:
Talks[filename1][TermVectors]=
[['This','is','paragraph','one'],
['paragraph','two']]
我想进一步操作 TermVectors
,但保留原始段落列表结构。这将创建一个每行 1 个术语的列表:
for eachFilename in Talks:
for eachTerm in range( 0, len( Talks[eachFilename]['TermVectors'] ) ):
for term in Talks[eachFilename]['TermVectors'][ eachTerm ]:
if unicode(term) not in stop_words:
Talks[eachFilename]['TermVectorsNoStops'].append( term )
结果(我丢失了段落结构):
Talks[filename1][TermVectorsNoStops]=
[['This'],
['is'],
['paragraph'],
['one'],
['paragraph'],
['two']]
您报告的错误(字符串不可变?)没有任何意义,除非您的列表实际上不为空但已经填充了字符串。无论如何,如果您从一个空列表开始,那么填充它的最简单方法是附加:
>>> talks = {}
>>> talks['each_file_name'] = {}
>>> talks['each_file_name']['terms_removed'] = []
>>> talks['each_file_name']['terms_removed'].append(['paragraph','text'])
>>> talks['each_file_name']['terms_removed'].append(['paragraph','2'])
>>> talks['each_file_name']['terms_removed'].append(['paragraph'])
>>> talks
{'each_file_name': {'terms_removed': [['paragraph', 'text'], ['paragraph', '2'], ['paragraph']]}}
>>> from pprint import pprint
>>> pprint(talks)
{'each_file_name': {'terms_removed': [['paragraph', 'text'],
['paragraph', '2'],
['paragraph']]}}
如果你有一个空列表并尝试使用索引分配给它,它会抛出一个错误:
>>> empty_list = []
>>> empty_list[0] = 10
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range
顺便说一句,代码如下:
for eachFilename in Talks:
for eachTerm in range( 0, len( Talks[eachFilename]['TermVectors'] ) ):
for term in Talks[eachFilename]['TermVectors'][ eachTerm ]:
if unicode(term) not in stop_words:
Talks[eachFilename]['TermVectorsNoStops'].append( term )
与正确的 Python 风格相去甚远。不要使用 camelCase,使用 snake_case。不要大写变量。此外,在你的中级 for 循环中,你使用 for eachTerm in range(0, len(Talks[eachFilename]['TermVectors']
,但 eachTerm
是一个 int
,因此使用标准 i
[=21] 更有意义=] 或 k
。甚至 idx
.
无论如何,没有理由让代码变成这样:
Talks[filename1][TermVectors] =
[['This','is','paragraph','one'],
['paragraph','two']]
进入这个:
Talks[filename1][TermVectors] =
[['This'],
['is'],
['paragraph'],
['one'],
['paragraph'],
['two']]
这是一个可重现的例子(我已经为你做了这个,但你应该在发布问题之前自己做):
>>> pprint(talks)
{'file1': {'no_stops': [],
'term_vectors': [['This', 'is', 'paragraph', 'one'],
['paragraph', 'two']]},
'file2': {'no_stops': [],
'term_vectors': [['This', 'is', 'paragraph', 'three'],
['paragraph', 'four']]}}
>>> for file in talks:
... for i in range(len(talks[file]['term_vectors'])):
... for term in talks[file]['term_vectors'][i]:
... if term not in stop_words:
... talks[file]['no_stops'].append(term)
...
>>> pprint(file)
'file2'
>>> pprint(talks)
{'file1': {'no_stops': ['This', 'paragraph', 'one', 'paragraph'],
'term_vectors': [['This', 'is', 'paragraph', 'one'],
['paragraph', 'two']]},
'file2': {'no_stops': ['This', 'paragraph', 'paragraph', 'four'],
'term_vectors': [['This', 'is', 'paragraph', 'three'],
['paragraph', 'four']]}}
>>>
更像 pythonic 的方法如下:
>>> pprint(talks)
{'file1': {'no_stops': [],
'term_vectors': [['This', 'is', 'paragraph', 'one'],
['paragraph', 'two']]},
'file2': {'no_stops': [],
'term_vectors': [['This', 'is', 'paragraph', 'three'],
['paragraph', 'four']]}}
>>> for file in talks.values():
... file['no_stops'] = [[term for term in sub if term not in stop_words] for sub in file['term_vectors']]
...
>>> pprint(talks)
{'file1': {'no_stops': [['This', 'paragraph', 'one'], ['paragraph']],
'term_vectors': [['This', 'is', 'paragraph', 'one'],
['paragraph', 'two']]},
'file2': {'no_stops': [['This', 'paragraph'], ['paragraph', 'four']],
'term_vectors': [['This', 'is', 'paragraph', 'three'],
['paragraph', 'four']]}}
>>>
一些持续的实验以及评论让我朝着解决方案迈进。我没有附加每个单独的术语,这会生成一个长列表,而是将这些术语累积到一个列表中,然后附加每个列表,如下所示:
for eachFilename in Talks:
for eachTerm in range( 0, len( Talks[eachFilename]['TermVectors'] ) ):
term_list = [ ]
for term in Talks[eachFilename]['TermVectors'][ eachTerm ]:
if unicode(term) not in stop_words:
term_list.append(term)
Talks[eachFilename]['TermVectorsNoStops'].append( term )
谢谢大家!
这与 How to append to the end of an empty list? 有关,但我还没有足够的声誉在那里发表评论,所以我在这里发布了一个新问题。
我需要将术语附加到一个空的列表列表中。我开始于:
Talks[eachFilename][TermVectors]=
[['paragraph','1','text'],
['paragraph','2','text'],
['paragraph','3','text']]
我想以
结尾Talks[eachFilename][SomeTermsRemoved]=
[['paragraph','text'],
['paragraph','2'],
['paragraph']]
Talks[eachFilename][SomeTermsRemoved]
开始为空。我无法指定我想要的:
Talks[eachFilename][SomeTermsRemoved][0][0]='paragraph'
Talks[eachFilename][SomeTermsRemoved][0][1]='text'
Talks[eachFilename][SomeTermsRemoved][1][0]='paragraph'
等...(IndexError:列表索引超出范围)。如果我强制填充字符串然后尝试更改它,我会得到一个字符串是不可变的错误。
那么,我要如何指定 Talks[eachFilename][SomeTermsRemoved][0]
为 ['paragraph','text']
,Talks[eachFilename][SomeTermsRemoved][1]
为 ['paragraph','2']
等等?
.append
有效,但只生成一个长列,而不是一组列表。
更具体地说,我有许多在字典中初始化的列表
Talks = {}
Talks[eachFilename]= {}
Talks[eachFilename]['StartingText']=[]
Talks[eachFilename]['TermVectors']=[]
Talks[eachFilename]['TermVectorsNoStops']=[]
eachFilename
从文本文件列表中填充,例如:
Talks[eachFilename]=['filename1','filename2']
StartingText
有好几行长文字(个别段落)
Talks[filename1][StartingText]=['This is paragraph one','paragraph two']
TermVectors 由带有术语列表的 NLTK 包填充,仍然分组在原始段落中:
Talks[filename1][TermVectors]=
[['This','is','paragraph','one'],
['paragraph','two']]
我想进一步操作 TermVectors
,但保留原始段落列表结构。这将创建一个每行 1 个术语的列表:
for eachFilename in Talks:
for eachTerm in range( 0, len( Talks[eachFilename]['TermVectors'] ) ):
for term in Talks[eachFilename]['TermVectors'][ eachTerm ]:
if unicode(term) not in stop_words:
Talks[eachFilename]['TermVectorsNoStops'].append( term )
结果(我丢失了段落结构):
Talks[filename1][TermVectorsNoStops]=
[['This'],
['is'],
['paragraph'],
['one'],
['paragraph'],
['two']]
您报告的错误(字符串不可变?)没有任何意义,除非您的列表实际上不为空但已经填充了字符串。无论如何,如果您从一个空列表开始,那么填充它的最简单方法是附加:
>>> talks = {}
>>> talks['each_file_name'] = {}
>>> talks['each_file_name']['terms_removed'] = []
>>> talks['each_file_name']['terms_removed'].append(['paragraph','text'])
>>> talks['each_file_name']['terms_removed'].append(['paragraph','2'])
>>> talks['each_file_name']['terms_removed'].append(['paragraph'])
>>> talks
{'each_file_name': {'terms_removed': [['paragraph', 'text'], ['paragraph', '2'], ['paragraph']]}}
>>> from pprint import pprint
>>> pprint(talks)
{'each_file_name': {'terms_removed': [['paragraph', 'text'],
['paragraph', '2'],
['paragraph']]}}
如果你有一个空列表并尝试使用索引分配给它,它会抛出一个错误:
>>> empty_list = []
>>> empty_list[0] = 10
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range
顺便说一句,代码如下:
for eachFilename in Talks:
for eachTerm in range( 0, len( Talks[eachFilename]['TermVectors'] ) ):
for term in Talks[eachFilename]['TermVectors'][ eachTerm ]:
if unicode(term) not in stop_words:
Talks[eachFilename]['TermVectorsNoStops'].append( term )
与正确的 Python 风格相去甚远。不要使用 camelCase,使用 snake_case。不要大写变量。此外,在你的中级 for 循环中,你使用 for eachTerm in range(0, len(Talks[eachFilename]['TermVectors']
,但 eachTerm
是一个 int
,因此使用标准 i
[=21] 更有意义=] 或 k
。甚至 idx
.
无论如何,没有理由让代码变成这样:
Talks[filename1][TermVectors] =
[['This','is','paragraph','one'],
['paragraph','two']]
进入这个:
Talks[filename1][TermVectors] =
[['This'],
['is'],
['paragraph'],
['one'],
['paragraph'],
['two']]
这是一个可重现的例子(我已经为你做了这个,但你应该在发布问题之前自己做):
>>> pprint(talks)
{'file1': {'no_stops': [],
'term_vectors': [['This', 'is', 'paragraph', 'one'],
['paragraph', 'two']]},
'file2': {'no_stops': [],
'term_vectors': [['This', 'is', 'paragraph', 'three'],
['paragraph', 'four']]}}
>>> for file in talks:
... for i in range(len(talks[file]['term_vectors'])):
... for term in talks[file]['term_vectors'][i]:
... if term not in stop_words:
... talks[file]['no_stops'].append(term)
...
>>> pprint(file)
'file2'
>>> pprint(talks)
{'file1': {'no_stops': ['This', 'paragraph', 'one', 'paragraph'],
'term_vectors': [['This', 'is', 'paragraph', 'one'],
['paragraph', 'two']]},
'file2': {'no_stops': ['This', 'paragraph', 'paragraph', 'four'],
'term_vectors': [['This', 'is', 'paragraph', 'three'],
['paragraph', 'four']]}}
>>>
更像 pythonic 的方法如下:
>>> pprint(talks)
{'file1': {'no_stops': [],
'term_vectors': [['This', 'is', 'paragraph', 'one'],
['paragraph', 'two']]},
'file2': {'no_stops': [],
'term_vectors': [['This', 'is', 'paragraph', 'three'],
['paragraph', 'four']]}}
>>> for file in talks.values():
... file['no_stops'] = [[term for term in sub if term not in stop_words] for sub in file['term_vectors']]
...
>>> pprint(talks)
{'file1': {'no_stops': [['This', 'paragraph', 'one'], ['paragraph']],
'term_vectors': [['This', 'is', 'paragraph', 'one'],
['paragraph', 'two']]},
'file2': {'no_stops': [['This', 'paragraph'], ['paragraph', 'four']],
'term_vectors': [['This', 'is', 'paragraph', 'three'],
['paragraph', 'four']]}}
>>>
一些持续的实验以及评论让我朝着解决方案迈进。我没有附加每个单独的术语,这会生成一个长列表,而是将这些术语累积到一个列表中,然后附加每个列表,如下所示:
for eachFilename in Talks:
for eachTerm in range( 0, len( Talks[eachFilename]['TermVectors'] ) ):
term_list = [ ]
for term in Talks[eachFilename]['TermVectors'][ eachTerm ]:
if unicode(term) not in stop_words:
term_list.append(term)
Talks[eachFilename]['TermVectorsNoStops'].append( term )
谢谢大家!