获取文件名,文件路径,找到搜索字符串时获取行,只提取该行搜索字符串的一部分

get filename , file path , get the line when the search string is found and extract only a part followed by search string of that line

可能我会直接用例子来解释:我在 python 中编写我的代码,对于 grep 部分也使用 bash 命令。

我有几个文件,我需要在其中 grep 寻找一些模式,比方说“INFO” 所有这些文件都可以存在两个不同的目录结构:tyep1,type2

  1. /home/user1/logs/MAIN_JOB/121/patching/a.log (type1)
  2. /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log (type2)
  3. /home/user1/logs/MAIN_JOB/SUB_JOB1/142/DB:2/patching/c.log (type2)

文件内容:

a.log :
[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.

b.log :
[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.

c.log :
[Thu Jan 22 18:01:00 UTC 2022]: database1: ERR: Subject3: This is subject 3.

所以我需要知道哪些文件中存在“INFO”字符串。如果存在,我需要得到以下信息:

文件名:a.log / b.log

文件路径:/home/user1/logs/MAIN_JOB/121/patching 或 /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/补丁

搜索字符串后的直接字符串:Subject1 / Subject2

所以我尝试使用带有 -r 的 grep 命令来了解我能找到哪些文件“INFO”

$ grep -r /home/user1/logs/MAIN_JOB
/home/user1/logs/MAIN_JOB/121/patching/a.log:[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log:[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
$

所以我将存储上面的 grep python 变量并且需要从这个输出中提取上面的东西。

我最初尝试将 grep o/p 与 "\n" 分开,所以我会得到两个单独的行

/home/user1/logs/MAIN_JOB/121/patching/a.log:[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.

/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log:[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.

通过获取每一行,我可以用“:”分割 第一行:我能够正确拆分,因为“:”在正确的位置。

file_with_path : /home/user1/logs/MAIN_JOB/121/patching/a.log(I can get file name separate with os.path.basename(file_with_path))
immediate str after search word : "Subject1"

第二行:这是我需要帮助的地方,因为在路径中我们有这个“DB:1”,其中有“:”,这会破坏我的正确拆分。如果我分裂我会得到如下

file_with_path : /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB (not correct)
actually should be /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log

我无法在此处应用拆分,因为它在这两种情况下都无法正常工作。

你能帮我解决这个问题吗?任何可以在 bash 或 python 中完成这项工作的命令都会非常有帮助。 先感谢您。如果需要我提供一些信息,也请告诉我。

给出以下代码:

# main dir 
        patch_log_home = '/home/user1/logs/MAIN_JOB'
        cmd = "grep -r 'INFO' {0}"
        patch_bug_inc = self._core.exec_os_cmd(cmd.format(patch_log_home))

        # if no occurrance reported continue
        if len(patch_bug_inc) == 0:
            return

        if patch_bug_inc:
            patch_bug_inc = patch_bug_inc.split("\n");

        for inc in patch_bug_inc:
             print("_________________________________________________")

             inc = inc.split(":")

             # to get subject part
             patch_bug_str_index = [i for i, s in enumerate(inc) if 'INFO' in s][0]
             inc_name = inc[patch_bug_str_index+1]

             # file name 
             log_file_name = os.path.basename(inc[0])

             # get file path
             log_path = os.path.split(inc[0])
             print("log_path :", log_path)
             full_path = log_path[0]
             print("FULL PATH: ", full_path)

这是一种无需调用 grep 即可实现此目的的方法,正如我在评论中所说,它可能不可移植:

import os
import sys

for root, _, files in os.walk('/home/user1/logs/MAIN_JOB'):
    for file in files:
        if file.endswith('.log'):
            path = os.path.join(root, file)
            try:
                with open(path) as infile:
                    for line in infile:
                        if 'INFO:' in line:
                            print(path)
                            break
            except Exception:
                print(f"Unable to process {path}", file=sys.stderr)