如何将多个文件的输出传递给一个数组

Question

我正在尝试运行我的文件上的 lda 模型。首先我做了一些预处理，比如标记化和停止词删除。我正在为多个文件执行此操作，但是当我将最终输出传递给 lda 模型时，它给了我一个错误，我在 Google 中看到 lda 将多个文件作为输入。现在我想将每个文件的输出存储到一个数组，然后将该数组作为输入传递，但它也给我一个错误 IndexError: list assignment index out of range。我不知道是什么问题。如有任何帮助，我们将不胜感激！

   # URDU STOP WORDS REMOVAL
    doc_clean = []
    stopwords_corpus = UrduCorpusReader('./data', ['stopwords-ur.txt'])    
    stopwords = stopwords_corpus.words()
    count = 1
    # print(stopwords)
    for infile in (wordlists.fileids()):
        words = wordlists.words(infile)
        finalized_words = remove_urdu_stopwords(stopwords, words)
        doc_clean[count] = finalized_words
        print(doc_clean)
        count =count+1
        print("\n==== WITHOUT STOPWORDS ===========\n")
        print(finalized_words)
        id2word = corpora.Dictionary(doc_clean)
        mm = [id2word.doc2bow(text) for text in texts]
        lda = models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=3, update_every=1, chunksize=10000, passes=1)

Answer 1

这里不需要使用count变量。 List 提供 append 向列表中添加元素的功能。
更改此

  doc_clean[count] = finalized_words

对此

 doc_clean.append(finalized_words)

Answer 2

您将 doc_clean 定义为空列表，但在第一次迭代中您引用 doc_clean[count] 且 count=1，因此指向空列表的第二个元素。

替换

doc_clean[count]=finalized_words

和

doc_cleanappend(finalized_words)

那就不用count了

如何将多个文件的输出传递给一个数组

how to pass output of multiple files to an array

python

arrays

python-3.x

lda