在大文件的内容中获取一行

Question

我想知道如何实现 Aaron Digulla 在这个问题中的回答： Fastest Text search method in a large text file

with open ('test.txt', 'rt') as myfile:
    contents = myfile.read() 
    match = re.search("abc", contents)

下一步是什么，以便我可以找到上一个 EOL 和下一个 EOL，以便我可以提取行？

Answer 1

您可以使用匹配对象的起始索引，使用 str.find and str.rfind 及其 start 和 end 参数来查找上一个和下一个 EOL：

with open ('test.txt', 'rt') as myfile:
    contents = myfile.read() 
    match = re.search("abc", contents)
    start = match.start()
    previous_EOL = contents.rfind('\n', 0, start)
    next_EOL = contents.find('\n', start)
    line = contents[previous_EOL+1: next_EOL]

例如：

contents = '''
This is a sample text
Here is 'abc' in this line.
There are some other lines.'''

match = re.search("abc", contents)
start = match.start()
previous_EOL = contents.rfind('\n', 0, start)
next_EOL = contents.find('\n', start)
line = contents[previous_EOL+1: next_EOL]

print(line)

打印：

Here is 'abc' in this line.

Answer 2

替换

match = re.search("abc", contents)

和

match = re.search("^.*abc.*$", contents, re.M)

它将匹配包含“abc”的整行。与 re.M 标志一起使用 ^ 匹配行的开头和 $ 它的结尾。

这是一个例子：

import re

s = """
Twinkle, twinkle
Little star!
How I wonder 
What you are!
"""

term = "star"
match = re.search(f"^.*{term}.*$", s, re.M)
print(match.group(0))

它给出：

Little star!

在大文件的内容中获取一行

Getting a line in a large file's content

python

performance

file

text-files

large-files