获取文件名，文件路径，找到搜索字符串时获取行，只提取该行搜索字符串的一部分

Question

可能我会直接用例子来解释：我在 python 中编写我的代码，对于 grep 部分也使用 bash 命令。

我有几个文件，我需要在其中 grep 寻找一些模式，比方说“INFO” 所有这些文件都可以存在两个不同的目录结构：tyep1，type2

/home/user1/logs/MAIN_JOB/121/patching/a.log (type1)
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log (type2)
/home/user1/logs/MAIN_JOB/SUB_JOB1/142/DB:2/patching/c.log (type2)

文件内容：

a.log :
[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.

b.log :
[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.

c.log :
[Thu Jan 22 18:01:00 UTC 2022]: database1: ERR: Subject3: This is subject 3.

所以我需要知道哪些文件中存在“INFO”字符串。如果存在，我需要得到以下信息：

文件名：a.log / b.log

文件路径：/home/user1/logs/MAIN_JOB/121/patching 或 /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/补丁

搜索字符串后的直接字符串：Subject1 / Subject2

所以我尝试使用带有 -r 的 grep 命令来了解我能找到哪些文件“INFO”

$ grep -r /home/user1/logs/MAIN_JOB
/home/user1/logs/MAIN_JOB/121/patching/a.log:[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log:[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
$

所以我将存储上面的 grep python 变量并且需要从这个输出中提取上面的东西。

我最初尝试将 grep o/p 与 "\n" 分开，所以我会得到两个单独的行

/home/user1/logs/MAIN_JOB/121/patching/a.log:[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.

/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log:[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.

通过获取每一行，我可以用“:”分割第一行：我能够正确拆分，因为“：”在正确的位置。

file_with_path : /home/user1/logs/MAIN_JOB/121/patching/a.log(I can get file name separate with os.path.basename(file_with_path))
immediate str after search word : "Subject1"

第二行：这是我需要帮助的地方，因为在路径中我们有这个“DB:1”，其中有“:”，这会破坏我的正确拆分。如果我分裂我会得到如下

file_with_path : /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB (not correct)
actually should be /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log

我无法在此处应用拆分，因为它在这两种情况下都无法正常工作。

你能帮我解决这个问题吗？任何可以在 bash 或 python 中完成这项工作的命令都会非常有帮助。先感谢您。如果需要我提供一些信息，也请告诉我。

给出以下代码：

# main dir 
        patch_log_home = '/home/user1/logs/MAIN_JOB'
        cmd = "grep -r 'INFO' {0}"
        patch_bug_inc = self._core.exec_os_cmd(cmd.format(patch_log_home))

        # if no occurrance reported continue
        if len(patch_bug_inc) == 0:
            return

        if patch_bug_inc:
            patch_bug_inc = patch_bug_inc.split("\n");

        for inc in patch_bug_inc:
             print("_________________________________________________")

             inc = inc.split(":")

             # to get subject part
             patch_bug_str_index = [i for i, s in enumerate(inc) if 'INFO' in s][0]
             inc_name = inc[patch_bug_str_index+1]

             # file name 
             log_file_name = os.path.basename(inc[0])

             # get file path
             log_path = os.path.split(inc[0])
             print("log_path :", log_path)
             full_path = log_path[0]
             print("FULL PATH: ", full_path)

Answer 1

这是一种无需调用 grep 即可实现此目的的方法，正如我在评论中所说，它可能不可移植：

import os
import sys

for root, _, files in os.walk('/home/user1/logs/MAIN_JOB'):
    for file in files:
        if file.endswith('.log'):
            path = os.path.join(root, file)
            try:
                with open(path) as infile:
                    for line in infile:
                        if 'INFO:' in line:
                            print(path)
                            break
            except Exception:
                print(f"Unable to process {path}", file=sys.stderr)

获取文件名，文件路径，找到搜索字符串时获取行，只提取该行搜索字符串的一部分

get filename , file path , get the line when the search string is found and extract only a part followed by search string of that line

python

bash

grep

split