在大文件的内容中获取一行
Getting a line in a large file's content
我想知道如何实现 Aaron Digulla 在这个问题中的回答:
Fastest Text search method in a large text file
with open ('test.txt', 'rt') as myfile:
contents = myfile.read()
match = re.search("abc", contents)
下一步是什么,以便我可以找到上一个 EOL 和下一个 EOL,以便我可以提取行?
您可以使用匹配对象的起始索引,使用 str.find
and str.rfind
及其 start
和 end
参数来查找上一个和下一个 EOL:
with open ('test.txt', 'rt') as myfile:
contents = myfile.read()
match = re.search("abc", contents)
start = match.start()
previous_EOL = contents.rfind('\n', 0, start)
next_EOL = contents.find('\n', start)
line = contents[previous_EOL+1: next_EOL]
例如:
contents = '''
This is a sample text
Here is 'abc' in this line.
There are some other lines.'''
match = re.search("abc", contents)
start = match.start()
previous_EOL = contents.rfind('\n', 0, start)
next_EOL = contents.find('\n', start)
line = contents[previous_EOL+1: next_EOL]
print(line)
打印:
Here is 'abc' in this line.
替换
match = re.search("abc", contents)
和
match = re.search("^.*abc.*$", contents, re.M)
它将匹配包含“abc”的整行。与 re.M
标志一起使用 ^
匹配行的开头和 $
它的结尾。
这是一个例子:
import re
s = """
Twinkle, twinkle
Little star!
How I wonder
What you are!
"""
term = "star"
match = re.search(f"^.*{term}.*$", s, re.M)
print(match.group(0))
它给出:
Little star!
我想知道如何实现 Aaron Digulla 在这个问题中的回答: Fastest Text search method in a large text file
with open ('test.txt', 'rt') as myfile:
contents = myfile.read()
match = re.search("abc", contents)
下一步是什么,以便我可以找到上一个 EOL 和下一个 EOL,以便我可以提取行?
您可以使用匹配对象的起始索引,使用 str.find
and str.rfind
及其 start
和 end
参数来查找上一个和下一个 EOL:
with open ('test.txt', 'rt') as myfile:
contents = myfile.read()
match = re.search("abc", contents)
start = match.start()
previous_EOL = contents.rfind('\n', 0, start)
next_EOL = contents.find('\n', start)
line = contents[previous_EOL+1: next_EOL]
例如:
contents = '''
This is a sample text
Here is 'abc' in this line.
There are some other lines.'''
match = re.search("abc", contents)
start = match.start()
previous_EOL = contents.rfind('\n', 0, start)
next_EOL = contents.find('\n', start)
line = contents[previous_EOL+1: next_EOL]
print(line)
打印:
Here is 'abc' in this line.
替换
match = re.search("abc", contents)
和
match = re.search("^.*abc.*$", contents, re.M)
它将匹配包含“abc”的整行。与 re.M
标志一起使用 ^
匹配行的开头和 $
它的结尾。
这是一个例子:
import re
s = """
Twinkle, twinkle
Little star!
How I wonder
What you are!
"""
term = "star"
match = re.search(f"^.*{term}.*$", s, re.M)
print(match.group(0))
它给出:
Little star!