Python,在多个文本文件中查找字符串的最快方法(有些文件很大)
Python, The fastest way to find string in multiple text files (some files are big)
我尝试在多个文件中搜索一个字符串,我的代码工作正常,但对于大文本文件需要几分钟时间。
wrd = b'my_word'
path = 'C:\path\to\files'
#### opens the path where all of .txt files are ####
for f in os.listdir(path):
if f.strip().endswith('.txt'):
with open(os.path.join(path, f), 'rb') as ofile:
#### loops through every line in the file comparing the strings ####
for line in ofile:
if wrd in line:
try:
sendMail(...)
logging.warning('There is an error {} in this file : {}'.format(line, f))
sys.exit(0)
except IOError as e:
logging.error('Operation failed: {}' .format(e.strerror))
sys.exit(0)
我找到了这个主题:Python finds a string in multiple files recursively and returns the file path
但它没有回答我的问题..
你知道如何让它更快吗?
我在 windows 服务器 2003 上使用 python3.4。
谢谢;)
我的文件是从 oracle 应用程序生成的,如果有错误,我会记录它并停止生成我的文件。
所以我通过从末尾读取文件来搜索我的字符串,因为我要查找的字符串是 Oracle 错误并且在文件末尾。
wrd = b'ORA-'
path = 'C:\path\to\files'
#### opens the path where all of .txt files are ####
for f in os.listdir(path):
if f.strip().endswith('.txt'):
with open(os.path.join(path, f), 'r') as ofile:
try:
ofile.seek (0, 2) # Seek a end of file
fsize = ofile.tell() # Get Size
ofile.seek (max (fsize-1024, 0), 0) # Set pos a last n chars
lines = ofile.readlines() # Read to end
lines = lines[-10:] # Get last 10 lines
for line in lines:
if string in line:
sendMail(.....)
logging.error('There is an error {} in this file : {}'.format(line, f))
sys.exit(0)
except IOError as e:
logging.error('Operation failed: {}'.format(e.strerror))
sys.exit(0)
我尝试在多个文件中搜索一个字符串,我的代码工作正常,但对于大文本文件需要几分钟时间。
wrd = b'my_word'
path = 'C:\path\to\files'
#### opens the path where all of .txt files are ####
for f in os.listdir(path):
if f.strip().endswith('.txt'):
with open(os.path.join(path, f), 'rb') as ofile:
#### loops through every line in the file comparing the strings ####
for line in ofile:
if wrd in line:
try:
sendMail(...)
logging.warning('There is an error {} in this file : {}'.format(line, f))
sys.exit(0)
except IOError as e:
logging.error('Operation failed: {}' .format(e.strerror))
sys.exit(0)
我找到了这个主题:Python finds a string in multiple files recursively and returns the file path 但它没有回答我的问题..
你知道如何让它更快吗?
我在 windows 服务器 2003 上使用 python3.4。
谢谢;)
我的文件是从 oracle 应用程序生成的,如果有错误,我会记录它并停止生成我的文件。
所以我通过从末尾读取文件来搜索我的字符串,因为我要查找的字符串是 Oracle 错误并且在文件末尾。
wrd = b'ORA-'
path = 'C:\path\to\files'
#### opens the path where all of .txt files are ####
for f in os.listdir(path):
if f.strip().endswith('.txt'):
with open(os.path.join(path, f), 'r') as ofile:
try:
ofile.seek (0, 2) # Seek a end of file
fsize = ofile.tell() # Get Size
ofile.seek (max (fsize-1024, 0), 0) # Set pos a last n chars
lines = ofile.readlines() # Read to end
lines = lines[-10:] # Get last 10 lines
for line in lines:
if string in line:
sendMail(.....)
logging.error('There is an error {} in this file : {}'.format(line, f))
sys.exit(0)
except IOError as e:
logging.error('Operation failed: {}'.format(e.strerror))
sys.exit(0)