Python 文件迭代器运行多次

Question

我正在使用 gensim 创建目录中示例文件的 word2vec 模型。我在网上看了一个教程，它读取目录中的文件并逐行处理它。我的示例文件中有 9 行。但是这段代码给了我 9 次相同的行。有人可以解释一下发生了什么吗。

 class MySentences(object):
     def __init__(self, dirname):
         self.dirname = dirname   

     def __iter__(self): 
         for fname in os.listdir(self.dirname):
             for line in open(os.path.join(self.dirname, fname)):
                 print os.path.join(self.dirname, fname)
                 yield line.split() 

 sentences = MySentences('/fakepath/Folder')

详情：假设文件名包含 3 行，如

hi how are you.
I am fine.
I am good.

line.split() 应该给我：['hi','how','are','you'] 只有一次。但是这种情况发生了 3 次，所以我得到上面的列表三次而不是一次。如果句子总数是5，那么它returns行5次。

Answer 1

首先你应该弄清楚你想要做什么。 class MySentences 将一个目录作为参数并创建一个对象 sentences，其中包含一个生成器。所以 sentences 有一个生成器包含目录中所有文件中的所有行。

例如：

for line in sentences:
    print(line)

你会得到很多以单词为元素的列表（我已经删除了打印路径的打印语句）。即：

['hi', 'how', 'are', 'you.']

['I', 'am', 'fine.']

['I', 'am', 'good.']

Python 文件迭代器运行多次

Python File iterator running multiple times

python

iterator

python-2.7

listiterator

word2vec

Python 文件迭代器 运行 多次

Python File iterator running multiple times

python

iterator

python-2.7

listiterator

word2vec

Python 文件迭代器运行多次