python nltk 循环打印 header 而不是值

Question

我在 csv 文件中标记了句子，但是当我尝试删除 for 循环中的停用词时，它停止打印单词并为所有句子打印列 header 任何想法在哪里最后一行的错误？

for review in tokenized_docs:
    new_review = []
    for token in review:
        new_token = x.sub(u'', token)
        if not new_token == u'':
            new_review.append(new_token)
    tokenized_docs_no_punctuation.append(new_review)
    words=pd.DataFrame(tokenized_docs_no_punctuation)
    #print(words)
    print([word for word in words if word not in stops])

输出显示如下

应该是单词而不是列 header 数字。

Answer 1

由于代码中的 words 是数据框，因此 word 成为 for 循环中的列名 (0, 1, 2,.. )。

您可以直接更改为列表。例如，

# before
# words=pd.DataFrame(tokenized_docs_no_punctuation)

# after
words = tokenized_docs_no_punctuation[0]

对我有用。

python nltk 循环打印 header 而不是值

python nltk loop printing header instead of the value

python

nlp

tokenize

stop-words

pandas