TypeError: list indices must be integers, not str (boolean convertion actually)

Question

import nltk
import random
from nltk.corpus import movie_reviews

documents=[(list(movie_reviews.words(fileid)),category)
           for category in movie_reviews.categories()
           for fileid in movie_reviews.fileids(category)]

random.shuffle(documents)
#print(documents[1])

all_words=[]

for w in movie_reviews.words():
    all_words.append(w.lower())

all_words=nltk.FreqDist(all_words)

word_features = list(all_words.keys())[:3000]

def find_features(document):
    words = set(document)
    features=[]
    for w in word_features:
        features[w]= (w in words)

    return features

print((find_features(movie_reviews.words('neg/cv000_29416.txt'))))

featuresets = [(find_features(rev), category) for (rev,category) in documents]

运行之后，我收到错误

features[w]= (w in words)
TypeError: list indices must be integers, not str

请帮我解决一下...

Answer 1

唯一需要做的改变是 features 必须初始化为 dict ({}) 而不是 list ([] ) 然后你可以填充它的内容。

TypeError 是因为 word_features 是 字符串的列表 ，您试图使用列表对其进行索引，而列表不能有字符串索引.

features={}
for w in word_features:
    features[w] = (w in words)

这里，word_features中的元素构成字典的keys，features保存布尔值，True根据是否出现相同的元素words（由于调用 set() 而保存唯一项）和 False 反之亦然。

Answer 2

您已尝试使用字符串索引列表 features，但使用 python 是不可能的。 列表索引只能是整数。你需要的是一个dictionary。

尝试使用 defaultdict 这意味着即使在字典中找不到键，也不会抛出 KeyError，而是创建一个新条目

from collections import defaultdict

features = defaultdict()
for w in word_features:
    features[w] = [w in words]