如何在更正 if 语句之前打印 X 行

How to print line X lines prior to correct if statement

我是 Python 的新手,对我在众多网页中找到的内容只有分段的千篇一律的知识。

话虽如此,我正在尝试在一个文件(~10k 行)中搜索我编写的一组 'filter' 类标准,然后我希望它打印符合标准的行并且前面有 X 行的行。

我创建了以下脚本来打开所述文件,逐行迭代,并将满足过滤条件的行打印到输出文件,但是我不知道如何将其合并到当前脚本中。

import os

output_file = 'Output.txt'
filename = 'BigFile.txt'                 

numLines = 0
numWords = 0
numChrs = 0
numMes = 0

f1 = open(output_file, 'w')
print 'Output File has been Opened'

with open(filename, 'r') as file:
   for line in file:
      wordsList = line.split()
      numLines += 1
      numWords += len(wordsList)
      numChrs += len(line)

      if "X" in line and "Y" not in line and "Z" in line:
          numMes += 1
          print >>f1, line
          print 'Object found and Catalogued in Output.txt'                          

print "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print >>f1, "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)

print "There are a total of %i thing in this file" % (numMes)
print >>f1, "There are a total of %i things in this file" % (numMes)

f1.close()

print 'Output Files have been Closed'

我的第一个猜测是使用 line.enumeration 但我不认为我可以只声明 lines - 5 来打印 lines 之前的 5 行:

lines = f1.enumeration()
if "blah blah" in line and "so so" not in line:
    print >>f1, lines
    print >>f1, [lines - 5]

不过最精彩的部分还在后面,因为我必须获取 Output.txt 文件并与另一个文件进行比较,以输出两个文件中的匹配条件...但是一次一个步骤,对吧?

-也可以随意添加 'proper' 技术简介...我相信这个脚本可以用更好的方式编写,所以请对我做错的任何事情进行教育。

在此先感谢您的帮助!


更新: 由于以下帮助,已成功实施修复:

import os

output_file = 'Output.txt'
filename = 'BigFile.txt'                 

numLines = 0
numWords = 0
numChrs = 0

numMulMes = 0

last5 = []

f1 = open(output_file, 'w')
print 'Output Files have been Opened'

with open(filename, 'r') as file:
    for line in file:
        wordsList = line.split()
        numLines += 1
        numWords += len(wordsList)
        numChrs += len(line)
        last5[:] = last5[-5:]+[line] 
        if "X" in line and "Y" not in line and "Z" not in line:
            del last5[1:5]           ###the missing piece of the puzzle!
            numMulMes += 1
            print >>f1, last5
            print 'Object found and Catalogued in Output.txt'

print "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print >>f1, "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)

print "There are a total of %i messages in this file" % (numMulMes)
print >>f1, "There are a total of %i messages in this file" % (numMulMes)

f1.close()
f3.close()

print 'Output Files have been Closed'

我一直试图通过另一个单独的脚本修改输出文件,并且在最长的时间内我一直在与 str vs lst 操作和错误问题作斗争。刚决定回到原来的剧本,一时兴起把它扔进去,中提琴。

感谢您将我推向正确的方向,从那里很容易弄明白!

你自己解决了大部分问题(计算字数、行数、行号等) - 您可以在浏览文件时简单地记住最后 n 行。

示例:

t = """"zero line
one line
two line
three line
four line 
five line 
six line
seven line 
eight line
""" 

last5 = [] # memory cell
for l in t.split("\n"):  # similar to your for line in file: 
    last5[:] = last5[-4:]+[l] # keep last 4 and add current line, inplace list mod 

    if "six" in l:
        print last5

你也可以看看deque并指定一个最大长度(你需要导入它)

from collections import deque

last5 = deque(maxlen=5)
for l in t.split("\n"): 
    last5.append(l) # will automatically only keep 5 (maxlen)

    if "six" in l:
        print last5

输出:

 # list version
 ['two line', 'three line', 'four line ', 'five line ', 'six line'] 

 # deque version
 deque(['two line', 'three line', 'four line ', 'five line ', 'six line'], maxlen=5) 

我没有写入文件,而是将内容输出到字典。处理完整个文件后,摘要数据字典将以 json 的形式转储到文件中。使用 Artner 的测试文件。

import os
import json

output_file = 'Output.txt'
filename = 'BigFile.txt'                 

#initiate output container
outDict = {}
for fields in ['numLines', 'numWords', 'numChrs', 'numMes']:
    outDict[fields] = 0

outDict['lineNum'] = []    

with open(filename, 'r') as file:
    for line in file:
      wordsList = line.strip().split("\s")
      outDict['numLines'] += 1
      outDict['numWords'] += len(wordsList)
      outDict['numChrs'] += len(line)

      #find items in the line
      if "t" in line:
          outDict['numMes'] += 1
          #save line number
          outDict['lineNum'].append(outDict['numLines']) 
          #save line content
          outDict['lineList'].append(line)

#record output          
with open(output_file, 'w') as f1:
    f1.write(json.dumps(outDict))    

##print lines of desire
#x number of lines before
x=5    
with open(filename, 'r') as file:
    for i, line in enumerate(file):
        #iterate over line numbers for which condition is met
        for j in range(0,len(outDict['lineNum'])):
            #if line number is between found line num and line num minus x, print
            if (outDict['lineNum'][j]-x) <= i <= outDict['lineNum'][j]:
                print(line)

此处与@PatricArtner 建议的解决方案相同,但带有环形缓冲区。它可能(或可能不会,我没有检查)处理大文件的速度更快。 这个想法很简单:我们可以创建一个具有所需大小(您应该保留的行数)和当前记录位置的计数器 cnt 的列表。对于每一行,我们应该将 cnt 增加 1 并根据缓冲区的大小取模。因此 cnt 在列表中循环。例如,如果列表大小为 5 cnt = (cnt+1)%5 将给出 0 1 2 3 4 0 1 2 等等。 cnt 的每一步都将指向我们列表中最旧的数据,这些数据将被新数据取代。下面是一个实现的例子。

t = """"zero line
six line - surprize 
one line
two line
three line
four line 
five line 
six line
seven line 
eight line
""" 


last5 = [None,None,None,None,None]
cnt = 0
for l in t.split("\n"):
  last5[cnt]=l
  if 'six' in l:
    print last5[(cnt+1)%5]
    print last5[(cnt+2)%5]
    print last5[(cnt+3)%5]
    print last5[(cnt+4)%5]
    print last5[(cnt+0)%5]
    print
  cnt = (cnt+1)%5

输出很简单:

None
None
None
"zero line
six line - surprize 

two line
three line
four line 
five line 
six line

注意:如果您从一个文件中读取,并且该文件很大并且您需要保留的字符串很大(例如,基因序列)并且您的条件不'经常触发,要聪明,不要将字符串保存在内存中。在文件中创建最后一个字符串开始的位置列表,并在需要时重新读取它们。下面是一个如何让它变得非常快的例子...

from numpy import random as rnd

print "Creating the file ...."
DNA=["G","C","T","A"]
with open("bigdatafile","w") as fd:
    for i in xrange(5000):
        fd.write("".join([ DNA[rnd.randint(4)] for x in xrange(2000)])+"\n")
print "DONE"
print
print "SEARCHING GGGGGGGGGGG"
last5, cnt = [0,0,0,0,0], 1
with open("bigdatafile","r") as fd:
    for i,l in enumerate(fd.readlines()):
        last5[cnt] = last5[(cnt+4)%5]+len(l)
        if "GGGGGGGGGGG" in l:
            print "FIND!"
            fd.seek(last5[(cnt+1)%5])
            print fd.read(last5[cnt]-last5[(cnt+1)%5])
        cnt = (cnt+1)%5

因为我在 , here is how to do the same thing on a *nix machine using grep's context line control 功能中提到了它。

首先假设您有以下文本文件test.txt:

zero line
one line
two line
three line
four line 
five line 
six line
seven line 
eight line

如果你想在匹配前得到 N 行,你可以使用 -B 选项。例如,对于 "six" 之前的 5 行:

$ grep -B 5 six test.txt 
one line
two line
three line
four line 
five line 
six line

还有 -A 选项,您可以使用它在匹配后获取 N 行,以及 -C 选项,您可以使用它在 AND 之前获取 N 行之后。