如何在 Python 中读取 txt 文件的特定部分？

Question

我需要从 txt 文件中提取一部分文本。
文件如下所示：

开始工作 DD / MM / YYYY HH:MM:SS
... 文本行 ...
... 更多文本行 ...
开始工作 DD / MM / YYYY HH: MM: SS
...我想要的文本行...
...我想要的更多文本行...

文件以 STARTINGWORK 开始，以文本行结束。
我需要在最后一个 STARTINGWORK 之后提取最后的文本部分，没有 STARTINGWORK str

我尝试使用 3 个 for 循环（一个开始，另一个读取中间行，最后一个结束）

     file = "records.txt"
     if file.endswith (".txt"):
       if os.path.exists (file):
         lines = [line.rstrip ('\ n') for line in open (file)]
         for line in lines:
             #extract the portion

Answer 1

试试这个：

file = "records.txt"
extracted_text = ""
    if file.endswith (".txt"):
        if os.path.exists (file):
            lines = open(file).read().split("STARTINGWORKING")
            extracted_text = lines[-1] #Here it is

Answer 2

您可以使用file_read_backwards 模块从头到尾读取文件。如果文件很大，它可以帮助您节省时间：

from file_read_backwards import FileReadBackwards

with FileReadBackwards("records.txt") as file:
    portion = list()
    for line in file:
         if not line.startswith('STARTINGWORKING'):
            portion.append(line)
         else:
            break
portion.reverse()

portion 包含所需的行。

Answer 3

我会选择 regex 路径来解决这个问题：

>>> import re
>>> input_data = open('path/file').read()
>>> result = re.search(r'.*STARTINGWORKING\s*(.*)$', input_data, re.DOTALL)
>>> print(result.group(1))
#'DD / MM / YYYY HH: MM: SS\n... text lines I want ...\n... more text lines that I want ...'

Answer 4

get_final_lines 生成器试图避免 mallocing 比需要更多的存储空间，在读取一个可能很大的文件时。

def get_final_lines(fin):
    buf = []
    for line in fin:
        if line.startswith('STARTINGWORK'):
            buf = []
        else:
            buf.append(line)

    yield from buf


if __name__ == '__main__':
    with open('some_file.txt') as fin:
        for line in get_final_lines(fin):
            print(line.rstrip())

Answer 5

您可以使用一个变量来保存自上次 STARTINGWORK 以来您阅读过的所有行。
当您完成文件处理后，您就拥有了所需的内容。

当然，您不需要先阅读列表中的所有行。您可以直接在打开的文件中阅读它，并且 returns 一次一行。即：

result = []
with open(file) as f:
    for line in f:
        if line.startswith("STARTINGWORK"):
            result = []       # Delete what would have accumulated
        result.append(line)  # Add the last line read
print("".join(result))

在 result 中，您拥有最后一个 STARTINGWORK 之后的所有内容，包括在内，如果您想删除初始 STARTINGWORK

，则可以保留 result [1:]

- 然后在代码中：

#list
result = []

#function
def appendlines(line, result, word):
  if linea.startswith(word):
    del result[:]
  result.append(line)
  return line, result

with open(file, "r") as lines: 
  for line in lines:              
    appendlines(line, result, "STARTINGWORK")
new_result = [line.rstrip("\n") for line in result[1:]]

如何在 Python 中读取 txt 文件的特定部分？

How to read a specific portion of a txt file in Python?

python

for-loop

file

readlines

python-3.x