如何在更正 if 语句之前打印 X 行
How to print line X lines prior to correct if statement
我是 Python 的新手,对我在众多网页中找到的内容只有分段的千篇一律的知识。
话虽如此,我正在尝试在一个文件(~10k 行)中搜索我编写的一组 'filter' 类标准,然后我希望它打印符合标准的行并且前面有 X 行的行。
我创建了以下脚本来打开所述文件,逐行迭代,并将满足过滤条件的行打印到输出文件,但是我不知道如何将其合并到当前脚本中。
import os
output_file = 'Output.txt'
filename = 'BigFile.txt'
numLines = 0
numWords = 0
numChrs = 0
numMes = 0
f1 = open(output_file, 'w')
print 'Output File has been Opened'
with open(filename, 'r') as file:
for line in file:
wordsList = line.split()
numLines += 1
numWords += len(wordsList)
numChrs += len(line)
if "X" in line and "Y" not in line and "Z" in line:
numMes += 1
print >>f1, line
print 'Object found and Catalogued in Output.txt'
print "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print >>f1, "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print "There are a total of %i thing in this file" % (numMes)
print >>f1, "There are a total of %i things in this file" % (numMes)
f1.close()
print 'Output Files have been Closed'
我的第一个猜测是使用 line.enumeration
但我不认为我可以只声明 lines - 5
来打印 lines
之前的 5 行:
lines = f1.enumeration()
if "blah blah" in line and "so so" not in line:
print >>f1, lines
print >>f1, [lines - 5]
不过最精彩的部分还在后面,因为我必须获取 Output.txt 文件并与另一个文件进行比较,以输出两个文件中的匹配条件...但是一次一个步骤,对吧?
-也可以随意添加 'proper' 技术简介...我相信这个脚本可以用更好的方式编写,所以请对我做错的任何事情进行教育。
在此先感谢您的帮助!
更新:
由于以下帮助,已成功实施修复:
import os
output_file = 'Output.txt'
filename = 'BigFile.txt'
numLines = 0
numWords = 0
numChrs = 0
numMulMes = 0
last5 = []
f1 = open(output_file, 'w')
print 'Output Files have been Opened'
with open(filename, 'r') as file:
for line in file:
wordsList = line.split()
numLines += 1
numWords += len(wordsList)
numChrs += len(line)
last5[:] = last5[-5:]+[line]
if "X" in line and "Y" not in line and "Z" not in line:
del last5[1:5] ###the missing piece of the puzzle!
numMulMes += 1
print >>f1, last5
print 'Object found and Catalogued in Output.txt'
print "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print >>f1, "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print "There are a total of %i messages in this file" % (numMulMes)
print >>f1, "There are a total of %i messages in this file" % (numMulMes)
f1.close()
f3.close()
print 'Output Files have been Closed'
我一直试图通过另一个单独的脚本修改输出文件,并且在最长的时间内我一直在与 str vs lst 操作和错误问题作斗争。刚决定回到原来的剧本,一时兴起把它扔进去,中提琴。
感谢您将我推向正确的方向,从那里很容易弄明白!
你自己解决了大部分问题(计算字数、行数、行号等)
- 您可以在浏览文件时简单地记住最后 n 行。
示例:
t = """"zero line
one line
two line
three line
four line
five line
six line
seven line
eight line
"""
last5 = [] # memory cell
for l in t.split("\n"): # similar to your for line in file:
last5[:] = last5[-4:]+[l] # keep last 4 and add current line, inplace list mod
if "six" in l:
print last5
你也可以看看deque并指定一个最大长度(你需要导入它)
from collections import deque
last5 = deque(maxlen=5)
for l in t.split("\n"):
last5.append(l) # will automatically only keep 5 (maxlen)
if "six" in l:
print last5
输出:
# list version
['two line', 'three line', 'four line ', 'five line ', 'six line']
# deque version
deque(['two line', 'three line', 'four line ', 'five line ', 'six line'], maxlen=5)
我没有写入文件,而是将内容输出到字典。处理完整个文件后,摘要数据字典将以 json
的形式转储到文件中。使用 Artner 的测试文件。
import os
import json
output_file = 'Output.txt'
filename = 'BigFile.txt'
#initiate output container
outDict = {}
for fields in ['numLines', 'numWords', 'numChrs', 'numMes']:
outDict[fields] = 0
outDict['lineNum'] = []
with open(filename, 'r') as file:
for line in file:
wordsList = line.strip().split("\s")
outDict['numLines'] += 1
outDict['numWords'] += len(wordsList)
outDict['numChrs'] += len(line)
#find items in the line
if "t" in line:
outDict['numMes'] += 1
#save line number
outDict['lineNum'].append(outDict['numLines'])
#save line content
outDict['lineList'].append(line)
#record output
with open(output_file, 'w') as f1:
f1.write(json.dumps(outDict))
##print lines of desire
#x number of lines before
x=5
with open(filename, 'r') as file:
for i, line in enumerate(file):
#iterate over line numbers for which condition is met
for j in range(0,len(outDict['lineNum'])):
#if line number is between found line num and line num minus x, print
if (outDict['lineNum'][j]-x) <= i <= outDict['lineNum'][j]:
print(line)
此处与@PatricArtner 建议的解决方案相同,但带有环形缓冲区。它可能(或可能不会,我没有检查)处理大文件的速度更快。
这个想法很简单:我们可以创建一个具有所需大小(您应该保留的行数)和当前记录位置的计数器 cnt
的列表。对于每一行,我们应该将 cnt 增加 1 并根据缓冲区的大小取模。因此 cnt
在列表中循环。例如,如果列表大小为 5 cnt = (cnt+1)%5
将给出 0 1 2 3 4 0 1 2
等等。 cnt
的每一步都将指向我们列表中最旧的数据,这些数据将被新数据取代。下面是一个实现的例子。
t = """"zero line
six line - surprize
one line
two line
three line
four line
five line
six line
seven line
eight line
"""
last5 = [None,None,None,None,None]
cnt = 0
for l in t.split("\n"):
last5[cnt]=l
if 'six' in l:
print last5[(cnt+1)%5]
print last5[(cnt+2)%5]
print last5[(cnt+3)%5]
print last5[(cnt+4)%5]
print last5[(cnt+0)%5]
print
cnt = (cnt+1)%5
输出很简单:
None
None
None
"zero line
six line - surprize
two line
three line
four line
five line
six line
注意:如果您从一个文件中读取,并且该文件很大并且您需要保留的字符串很大(例如,基因序列)并且您的条件不'经常触发,要聪明,不要将字符串保存在内存中。在文件中创建最后一个字符串开始的位置列表,并在需要时重新读取它们。下面是一个如何让它变得非常快的例子...
from numpy import random as rnd
print "Creating the file ...."
DNA=["G","C","T","A"]
with open("bigdatafile","w") as fd:
for i in xrange(5000):
fd.write("".join([ DNA[rnd.randint(4)] for x in xrange(2000)])+"\n")
print "DONE"
print
print "SEARCHING GGGGGGGGGGG"
last5, cnt = [0,0,0,0,0], 1
with open("bigdatafile","r") as fd:
for i,l in enumerate(fd.readlines()):
last5[cnt] = last5[(cnt+4)%5]+len(l)
if "GGGGGGGGGGG" in l:
print "FIND!"
fd.seek(last5[(cnt+1)%5])
print fd.read(last5[cnt]-last5[(cnt+1)%5])
cnt = (cnt+1)%5
因为我在 , here is how to do the same thing on a *nix machine using grep
's context line control 功能中提到了它。
首先假设您有以下文本文件test.txt
:
zero line
one line
two line
three line
four line
five line
six line
seven line
eight line
如果你想在匹配前得到 N
行,你可以使用 -B
选项。例如,对于 "six"
之前的 5 行:
$ grep -B 5 six test.txt
one line
two line
three line
four line
five line
six line
还有 -A
选项,您可以使用它在匹配后获取 N
行,以及 -C
选项,您可以使用它在 AND 之前获取 N
行之后。
我是 Python 的新手,对我在众多网页中找到的内容只有分段的千篇一律的知识。
话虽如此,我正在尝试在一个文件(~10k 行)中搜索我编写的一组 'filter' 类标准,然后我希望它打印符合标准的行并且前面有 X 行的行。
我创建了以下脚本来打开所述文件,逐行迭代,并将满足过滤条件的行打印到输出文件,但是我不知道如何将其合并到当前脚本中。
import os
output_file = 'Output.txt'
filename = 'BigFile.txt'
numLines = 0
numWords = 0
numChrs = 0
numMes = 0
f1 = open(output_file, 'w')
print 'Output File has been Opened'
with open(filename, 'r') as file:
for line in file:
wordsList = line.split()
numLines += 1
numWords += len(wordsList)
numChrs += len(line)
if "X" in line and "Y" not in line and "Z" in line:
numMes += 1
print >>f1, line
print 'Object found and Catalogued in Output.txt'
print "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print >>f1, "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print "There are a total of %i thing in this file" % (numMes)
print >>f1, "There are a total of %i things in this file" % (numMes)
f1.close()
print 'Output Files have been Closed'
我的第一个猜测是使用 line.enumeration
但我不认为我可以只声明 lines - 5
来打印 lines
之前的 5 行:
lines = f1.enumeration()
if "blah blah" in line and "so so" not in line:
print >>f1, lines
print >>f1, [lines - 5]
不过最精彩的部分还在后面,因为我必须获取 Output.txt 文件并与另一个文件进行比较,以输出两个文件中的匹配条件...但是一次一个步骤,对吧?
-也可以随意添加 'proper' 技术简介...我相信这个脚本可以用更好的方式编写,所以请对我做错的任何事情进行教育。
在此先感谢您的帮助!
更新: 由于以下帮助,已成功实施修复:
import os
output_file = 'Output.txt'
filename = 'BigFile.txt'
numLines = 0
numWords = 0
numChrs = 0
numMulMes = 0
last5 = []
f1 = open(output_file, 'w')
print 'Output Files have been Opened'
with open(filename, 'r') as file:
for line in file:
wordsList = line.split()
numLines += 1
numWords += len(wordsList)
numChrs += len(line)
last5[:] = last5[-5:]+[line]
if "X" in line and "Y" not in line and "Z" not in line:
del last5[1:5] ###the missing piece of the puzzle!
numMulMes += 1
print >>f1, last5
print 'Object found and Catalogued in Output.txt'
print "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print >>f1, "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print "There are a total of %i messages in this file" % (numMulMes)
print >>f1, "There are a total of %i messages in this file" % (numMulMes)
f1.close()
f3.close()
print 'Output Files have been Closed'
我一直试图通过另一个单独的脚本修改输出文件,并且在最长的时间内我一直在与 str vs lst 操作和错误问题作斗争。刚决定回到原来的剧本,一时兴起把它扔进去,中提琴。
感谢您将我推向正确的方向,从那里很容易弄明白!
你自己解决了大部分问题(计算字数、行数、行号等) - 您可以在浏览文件时简单地记住最后 n 行。
示例:
t = """"zero line
one line
two line
three line
four line
five line
six line
seven line
eight line
"""
last5 = [] # memory cell
for l in t.split("\n"): # similar to your for line in file:
last5[:] = last5[-4:]+[l] # keep last 4 and add current line, inplace list mod
if "six" in l:
print last5
你也可以看看deque并指定一个最大长度(你需要导入它)
from collections import deque
last5 = deque(maxlen=5)
for l in t.split("\n"):
last5.append(l) # will automatically only keep 5 (maxlen)
if "six" in l:
print last5
输出:
# list version
['two line', 'three line', 'four line ', 'five line ', 'six line']
# deque version
deque(['two line', 'three line', 'four line ', 'five line ', 'six line'], maxlen=5)
我没有写入文件,而是将内容输出到字典。处理完整个文件后,摘要数据字典将以 json
的形式转储到文件中。使用 Artner 的测试文件。
import os
import json
output_file = 'Output.txt'
filename = 'BigFile.txt'
#initiate output container
outDict = {}
for fields in ['numLines', 'numWords', 'numChrs', 'numMes']:
outDict[fields] = 0
outDict['lineNum'] = []
with open(filename, 'r') as file:
for line in file:
wordsList = line.strip().split("\s")
outDict['numLines'] += 1
outDict['numWords'] += len(wordsList)
outDict['numChrs'] += len(line)
#find items in the line
if "t" in line:
outDict['numMes'] += 1
#save line number
outDict['lineNum'].append(outDict['numLines'])
#save line content
outDict['lineList'].append(line)
#record output
with open(output_file, 'w') as f1:
f1.write(json.dumps(outDict))
##print lines of desire
#x number of lines before
x=5
with open(filename, 'r') as file:
for i, line in enumerate(file):
#iterate over line numbers for which condition is met
for j in range(0,len(outDict['lineNum'])):
#if line number is between found line num and line num minus x, print
if (outDict['lineNum'][j]-x) <= i <= outDict['lineNum'][j]:
print(line)
此处与@PatricArtner 建议的解决方案相同,但带有环形缓冲区。它可能(或可能不会,我没有检查)处理大文件的速度更快。
这个想法很简单:我们可以创建一个具有所需大小(您应该保留的行数)和当前记录位置的计数器 cnt
的列表。对于每一行,我们应该将 cnt 增加 1 并根据缓冲区的大小取模。因此 cnt
在列表中循环。例如,如果列表大小为 5 cnt = (cnt+1)%5
将给出 0 1 2 3 4 0 1 2
等等。 cnt
的每一步都将指向我们列表中最旧的数据,这些数据将被新数据取代。下面是一个实现的例子。
t = """"zero line
six line - surprize
one line
two line
three line
four line
five line
six line
seven line
eight line
"""
last5 = [None,None,None,None,None]
cnt = 0
for l in t.split("\n"):
last5[cnt]=l
if 'six' in l:
print last5[(cnt+1)%5]
print last5[(cnt+2)%5]
print last5[(cnt+3)%5]
print last5[(cnt+4)%5]
print last5[(cnt+0)%5]
print
cnt = (cnt+1)%5
输出很简单:
None
None
None
"zero line
six line - surprize
two line
three line
four line
five line
six line
注意:如果您从一个文件中读取,并且该文件很大并且您需要保留的字符串很大(例如,基因序列)并且您的条件不'经常触发,要聪明,不要将字符串保存在内存中。在文件中创建最后一个字符串开始的位置列表,并在需要时重新读取它们。下面是一个如何让它变得非常快的例子...
from numpy import random as rnd
print "Creating the file ...."
DNA=["G","C","T","A"]
with open("bigdatafile","w") as fd:
for i in xrange(5000):
fd.write("".join([ DNA[rnd.randint(4)] for x in xrange(2000)])+"\n")
print "DONE"
print
print "SEARCHING GGGGGGGGGGG"
last5, cnt = [0,0,0,0,0], 1
with open("bigdatafile","r") as fd:
for i,l in enumerate(fd.readlines()):
last5[cnt] = last5[(cnt+4)%5]+len(l)
if "GGGGGGGGGGG" in l:
print "FIND!"
fd.seek(last5[(cnt+1)%5])
print fd.read(last5[cnt]-last5[(cnt+1)%5])
cnt = (cnt+1)%5
因为我在 grep
's context line control 功能中提到了它。
首先假设您有以下文本文件test.txt
:
zero line
one line
two line
three line
four line
five line
six line
seven line
eight line
如果你想在匹配前得到 N
行,你可以使用 -B
选项。例如,对于 "six"
之前的 5 行:
$ grep -B 5 six test.txt
one line
two line
three line
four line
five line
six line
还有 -A
选项,您可以使用它在匹配后获取 N
行,以及 -C
选项,您可以使用它在 AND 之前获取 N
行之后。