在 python 中按需读取文件
read from a file on demand in python
我想在需要时逐字阅读文本文件。就像 C++ 中的 ifstream
一样。我的意思是,我想打开文件,然后在需要时从中读取下一个单词,然后关闭它。我该怎么做?
您可以编写一个生成器函数——
- 按行读取文件内容。
- 查找并保存迭代器中的所有单词。
- 从迭代器中逐个生成单词。
考虑这个文件foo.txt
:
This is an example of speech synthesis in English.
This is an example of speech synthesis in Bangla.
下面的代码returns一字一句。但是,它仍然一次读取整个文件,而不是逐字读取。那是因为您必须逐行跟踪光标位置,然后逐字跟踪。这可能比一次读取整个文件或逐块读取更昂贵。
# In < Python3.9 import Generator from the 'typing' module.
from collections.abc import Generator
def word_reader(file_path: str) -> Generator[str, None, None]:
"""Read a file from the file path and return a
generator that returns the contents of the file
as words.
Parameters
----------
file_path : str
Path of the file.
Yields
-------
Generator[str, None, None]
Yield words one by one.
"""
with open(file_path, "r") as f:
# Read the entire file as lines. This returns a generator.
r = f.readlines()
# Aggregate all the words from all the sentences in another generator.
words = (word for sentence in r for word in sentence.split(" ") if word)
# This basically means: 'for word in words; yield word'.
yield from words
if __name__ == "__main__":
wr = word_reader("./foo.txt")
for word in wr:
# Doing some processing on the final words on a line.
if word.endswith(".\n"):
word = word.replace(".\n", "")
print(word)
这会打印:
This
is
an
example
of
speech
synthesis
in
English
...
您可以逐块读取文件,然后调用此函数逐个生成单词。
我想在需要时逐字阅读文本文件。就像 C++ 中的 ifstream
一样。我的意思是,我想打开文件,然后在需要时从中读取下一个单词,然后关闭它。我该怎么做?
您可以编写一个生成器函数——
- 按行读取文件内容。
- 查找并保存迭代器中的所有单词。
- 从迭代器中逐个生成单词。
考虑这个文件foo.txt
:
This is an example of speech synthesis in English.
This is an example of speech synthesis in Bangla.
下面的代码returns一字一句。但是,它仍然一次读取整个文件,而不是逐字读取。那是因为您必须逐行跟踪光标位置,然后逐字跟踪。这可能比一次读取整个文件或逐块读取更昂贵。
# In < Python3.9 import Generator from the 'typing' module.
from collections.abc import Generator
def word_reader(file_path: str) -> Generator[str, None, None]:
"""Read a file from the file path and return a
generator that returns the contents of the file
as words.
Parameters
----------
file_path : str
Path of the file.
Yields
-------
Generator[str, None, None]
Yield words one by one.
"""
with open(file_path, "r") as f:
# Read the entire file as lines. This returns a generator.
r = f.readlines()
# Aggregate all the words from all the sentences in another generator.
words = (word for sentence in r for word in sentence.split(" ") if word)
# This basically means: 'for word in words; yield word'.
yield from words
if __name__ == "__main__":
wr = word_reader("./foo.txt")
for word in wr:
# Doing some processing on the final words on a line.
if word.endswith(".\n"):
word = word.replace(".\n", "")
print(word)
这会打印:
This
is
an
example
of
speech
synthesis
in
English
...
您可以逐块读取文件,然后调用此函数逐个生成单词。