Python - 试图从目录中的多个文件中提取包含关键字的行
Python - Trying to extract lines containing a key word from multiple files in a directory
我正在尝试构建一个脚本,它可以查找特定文件夹中的所有文件,并提取包含关键字或短语的任何文本行。
对 python 很陌生,并不真正理解如何将我看到的其他人的多个建议拼凑起来。
import re
from glob import glob
search = []
linenum = 0
pattern = re.compile("Dawg", re.IGNORECASE) # Compile a case-insensitive regex
path = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in\Audit_2022.log'
filenames = glob('*.log')
print(f"\n{filenames}")
with open (path, 'rt') as myfile:
for line in myfile:
linenum += 1
if pattern.search(line) != None: # If a match is found
search.append((linenum, line.rstrip('\n')))
for x in search: # Iterate over the list of tuples
print("\nLine " + str(x[0]) + ": " + x[1])
除了一次只能看到一个文件外,这完全符合我的要求。
当我尝试从 path = line.
的末尾删除 'Audit_2022.log' 时出现了我的问题
Python 说 "PermissionError: [Errno 13] Permission denied: 'C:\Users\Username\Downloads\Testdataextraction\Throw it in'"。我认为这是因为它查看的是目录而不是文件,但我怎样才能让它读取多个文件?
非常感谢!
你得到那个异常的原因是因为 open
需要一个文件名,如果你只给它一个路径,它真的不知道该怎么做。一个最小的例子可以是:
path = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in\Audit_2022.log'
with open (path, 'rt') as f:
pass
如果文件存在,这应该 运行 没问题,但如果您将其更改为:
path = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in'
with open (path, 'rt') as f:
pass
那么这将抛出异常。
我怀疑你想做的是遍历 path
中的所有日志文件并尝试每一个,比如:
import os
path = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in'
filenames = glob(os.path.join(path, '*.log'))
print(f"\n{filenames}")
for filename in filenames:
with open (filename, 'rt') as myfile:
...
您可以使用 os.listdir()
获取目录中的所有文件,然后在目录中嵌套每个 file
的开始循环:
import os
folder = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in'
for file in glob(os.path.join(folder, '*.log')):
with open(file, 'rt') as myfile:
for line in myfile:
linenum += 1
if pattern.match(line): # If a match is found
search.append((linenum, line.rstrip('\n')))
See os.path.join()
for a better path joining alternative
假设您还需要显示文件名,您可以这样做:
import re
from glob import glob
import os
p = re.compile('Dawg', re.IGNORECASE)
path = r'C:\Users\Username\Downloads\Testdataextraction\Throw it in'
for file in glob(os.path.join(path, '*.log')):
with open(file) as logfile:
for i, line in enumerate(map(str.strip, logfile), 1):
if p.search(line) is not None:
print(f'File={file}, Line={i}, Data={line}')
我正在尝试构建一个脚本,它可以查找特定文件夹中的所有文件,并提取包含关键字或短语的任何文本行。
对 python 很陌生,并不真正理解如何将我看到的其他人的多个建议拼凑起来。
import re
from glob import glob
search = []
linenum = 0
pattern = re.compile("Dawg", re.IGNORECASE) # Compile a case-insensitive regex
path = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in\Audit_2022.log'
filenames = glob('*.log')
print(f"\n{filenames}")
with open (path, 'rt') as myfile:
for line in myfile:
linenum += 1
if pattern.search(line) != None: # If a match is found
search.append((linenum, line.rstrip('\n')))
for x in search: # Iterate over the list of tuples
print("\nLine " + str(x[0]) + ": " + x[1])
除了一次只能看到一个文件外,这完全符合我的要求。 当我尝试从 path = line.
的末尾删除 'Audit_2022.log' 时出现了我的问题Python 说 "PermissionError: [Errno 13] Permission denied: 'C:\Users\Username\Downloads\Testdataextraction\Throw it in'"。我认为这是因为它查看的是目录而不是文件,但我怎样才能让它读取多个文件?
非常感谢!
你得到那个异常的原因是因为 open
需要一个文件名,如果你只给它一个路径,它真的不知道该怎么做。一个最小的例子可以是:
path = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in\Audit_2022.log'
with open (path, 'rt') as f:
pass
如果文件存在,这应该 运行 没问题,但如果您将其更改为:
path = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in'
with open (path, 'rt') as f:
pass
那么这将抛出异常。
我怀疑你想做的是遍历 path
中的所有日志文件并尝试每一个,比如:
import os
path = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in'
filenames = glob(os.path.join(path, '*.log'))
print(f"\n{filenames}")
for filename in filenames:
with open (filename, 'rt') as myfile:
...
您可以使用 os.listdir()
获取目录中的所有文件,然后在目录中嵌套每个 file
的开始循环:
import os
folder = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in'
for file in glob(os.path.join(folder, '*.log')):
with open(file, 'rt') as myfile:
for line in myfile:
linenum += 1
if pattern.match(line): # If a match is found
search.append((linenum, line.rstrip('\n')))
See
os.path.join()
for a better path joining alternative
假设您还需要显示文件名,您可以这样做:
import re
from glob import glob
import os
p = re.compile('Dawg', re.IGNORECASE)
path = r'C:\Users\Username\Downloads\Testdataextraction\Throw it in'
for file in glob(os.path.join(path, '*.log')):
with open(file) as logfile:
for i, line in enumerate(map(str.strip, logfile), 1):
if p.search(line) is not None:
print(f'File={file}, Line={i}, Data={line}')