Python - 试图从目录中的多个文件中提取包含关键字的行

Python - Trying to extract lines containing a key word from multiple files in a directory

我正在尝试构建一个脚本,它可以查找特定文件夹中的所有文件,并提取包含关键字或短语的任何文本行。

对 python 很陌生,并不真正理解如何将我看到的其他人的多个建议拼凑起来。

import re
from glob import glob

search = []
linenum = 0
pattern = re.compile("Dawg", re.IGNORECASE)  # Compile a case-insensitive regex
path = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in\Audit_2022.log'
filenames = glob('*.log')
print(f"\n{filenames}")
with open (path, 'rt') as myfile:    
    for line in myfile:
        linenum += 1
        if pattern.search(line) != None:      # If a match is found 
            search.append((linenum, line.rstrip('\n')))
for x in search:                            # Iterate over the list of tuples
    print("\nLine " + str(x[0]) + ": " + x[1])

除了一次只能看到一个文件外,这完全符合我的要求。 当我尝试从 path = line.

的末尾删除 'Audit_2022.log' 时出现了我的问题

Python 说 "PermissionError: [Errno 13] Permission denied: 'C:\Users\Username\Downloads\Testdataextraction\Throw it in'"。我认为这是因为它查看的是目录而不是文件,但我怎样才能让它读取多个文件?

非常感谢!

你得到那个异常的原因是因为 open 需要一个文件名,如果你只给它一个路径,它真的不知道该怎么做。一个最小的例子可以是:

path = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in\Audit_2022.log'
with open (path, 'rt') as f:
  pass

如果文件存在,这应该 运行 没问题,但如果您将其更改为:

path = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in'
with open (path, 'rt') as f:
  pass

那么这将抛出异常。

我怀疑你想做的是遍历 path 中的所有日志文件并尝试每一个,比如:

import os
path = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in'
filenames = glob(os.path.join(path, '*.log'))   
print(f"\n{filenames}")
for filename in filenames:
  with open (filename, 'rt') as myfile:
  ...

您可以使用 os.listdir() 获取目录中的所有文件,然后在目录中嵌套每个 file 的开始循环:

import os

folder = 'C:\Users\Username\Downloads\Testdataextraction\Throw it in'

for file in glob(os.path.join(folder, '*.log')):
    with open(file, 'rt') as myfile:
        for line in myfile:
            linenum += 1
            if pattern.match(line): # If a match is found
                search.append((linenum, line.rstrip('\n')))

See os.path.join() for a better path joining alternative

假设您还需要显示文件名,您可以这样做:

import re
from glob import glob
import os
p = re.compile('Dawg', re.IGNORECASE)
path = r'C:\Users\Username\Downloads\Testdataextraction\Throw it in'
for file in glob(os.path.join(path, '*.log')):
    with open(file) as logfile:
        for i, line in enumerate(map(str.strip, logfile), 1):
            if p.search(line) is not None:
                print(f'File={file}, Line={i}, Data={line}')