在 python 中按需读取文件

read from a file on demand in python

我想在需要时逐字阅读文本文件。就像 C++ 中的 ifstream 一样。我的意思是,我想打开文件,然后在需要时从中读取下一个单词,然后关闭它。我该怎么做?

您可以编写一个生成器函数——

  • 按行读取文件内容。
  • 查找并保存迭代器中的所有单词。
  • 从迭代器中逐个生成单词。

考虑这个文件foo.txt:

This is an example of speech synthesis in English.
This is an example of speech synthesis in Bangla.

下面的代码returns一字一句。但是,它仍然一次读取整个文件,而不是逐字读取。那是因为您必须逐行跟踪光标位置,然后逐字跟踪。这可能比一次读取整个文件或逐块读取更昂贵。

# In < Python3.9 import Generator from the 'typing' module.
from collections.abc import Generator


def word_reader(file_path: str) -> Generator[str, None, None]:
    """Read a file from the file path and return a
    generator that returns the contents of the file
    as words.

    Parameters
    ----------
    file_path : str
        Path of the file.

    Yields
    -------
    Generator[str, None, None]
        Yield words one by one.

    """
    with open(file_path, "r") as f:
        # Read the entire file as lines. This returns a generator.
        r = f.readlines()

        # Aggregate all the words from all the sentences in another generator.
        words = (word for sentence in r for word in sentence.split(" ") if word)

        # This basically means: 'for word in words; yield word'.
        yield from words


if __name__ == "__main__":
    wr = word_reader("./foo.txt")
    for word in wr:
        # Doing some processing on the final words on a line.
        if word.endswith(".\n"):
            word = word.replace(".\n", "")
        print(word)

这会打印:

This
is
an
example
of
speech
synthesis
in
English
...

您可以逐块读取文件,然后调用此函数逐个生成单词。