有没有更快的方法从文件中提取行?

Is there a faster way to extract lines from a file?

我有一组文件需要搜索并提取某些行。现在,我正在使用 for 循环,但事实证明这在时间上非常昂贵。有没有比下面更快的方法?

import re

for file in files:
        localfile = open(file, 'r')
        for line in localfile:
                if re.search("Common English Words", line):
                      words = line.split("|")[0]
                      # Append words to file words.txt
                      open("words.txt","a+").write(words + "\n")

首先,您每次写入 words.txt 文件时都会创建一个新的文件描述符。 我 运行 进行了一些测试,发现 python 垃圾回收确实会在打开的文件描述符变得不可访问时关闭它们(至少在我的测试用例中)。 但是,每次要附加到文件时都创建一个文件描述符,成本很高。为了将来参考,使用 with as 块打开文件被认为是一种很好的做法。

TLDR: 您可以做的一项改进是只打开您正在写入的文件一次。 这是它的样子:

import re

with open("words.txt","a+") as words_file:
    for file in files:
            localfile = open(file, 'r')
                for line in localfile:
                        if re.search("Common English Words", line):
                              words = line.split("|")[0]
                              # Append words to file words.txt
                              words_file.write(words + "\n")

正如我所说,在打开文件时使用 with as 语句被认为是最佳做法。我们可以像这样完全实施此最佳实践:

import re

with open("words.txt","a+") as words_file:
    for file in files:
            with open(file, 'r') as localfile:
                for line in localfile:
                        if re.search("Common English Words", line):
                              words = line.split("|")[0]
                              # Append words to file words.txt
                              words_file.write(words + "\n")