获取文件名,文件路径,找到搜索字符串时获取行,只提取该行搜索字符串的一部分
get filename , file path , get the line when the search string is found and extract only a part followed by search string of that line
可能我会直接用例子来解释:我在 python 中编写我的代码,对于 grep 部分也使用 bash 命令。
我有几个文件,我需要在其中 grep 寻找一些模式,比方说“INFO”
所有这些文件都可以存在两个不同的目录结构:tyep1,type2
- /home/user1/logs/MAIN_JOB/121/patching/a.log (type1)
- /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log (type2)
- /home/user1/logs/MAIN_JOB/SUB_JOB1/142/DB:2/patching/c.log (type2)
文件内容:
a.log :
[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
b.log :
[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
c.log :
[Thu Jan 22 18:01:00 UTC 2022]: database1: ERR: Subject3: This is subject 3.
所以我需要知道哪些文件中存在“INFO”字符串。如果存在,我需要得到以下信息:
文件名:a.log / b.log
文件路径:/home/user1/logs/MAIN_JOB/121/patching 或 /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/补丁
搜索字符串后的直接字符串:Subject1 / Subject2
所以我尝试使用带有 -r 的 grep 命令来了解我能找到哪些文件“INFO”
$ grep -r /home/user1/logs/MAIN_JOB
/home/user1/logs/MAIN_JOB/121/patching/a.log:[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log:[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
$
所以我将存储上面的 grep python 变量并且需要从这个输出中提取上面的东西。
我最初尝试将 grep o/p 与 "\n" 分开,所以我会得到两个单独的行
/home/user1/logs/MAIN_JOB/121/patching/a.log:[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log:[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
通过获取每一行,我可以用“:”分割
第一行:我能够正确拆分,因为“:”在正确的位置。
file_with_path : /home/user1/logs/MAIN_JOB/121/patching/a.log(I can get file name separate with os.path.basename(file_with_path))
immediate str after search word : "Subject1"
第二行:这是我需要帮助的地方,因为在路径中我们有这个“DB:1”,其中有“:”,这会破坏我的正确拆分。如果我分裂我会得到如下
file_with_path : /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB (not correct)
actually should be /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log
我无法在此处应用拆分,因为它在这两种情况下都无法正常工作。
你能帮我解决这个问题吗?任何可以在 bash 或 python 中完成这项工作的命令都会非常有帮助。
先感谢您。如果需要我提供一些信息,也请告诉我。
给出以下代码:
# main dir
patch_log_home = '/home/user1/logs/MAIN_JOB'
cmd = "grep -r 'INFO' {0}"
patch_bug_inc = self._core.exec_os_cmd(cmd.format(patch_log_home))
# if no occurrance reported continue
if len(patch_bug_inc) == 0:
return
if patch_bug_inc:
patch_bug_inc = patch_bug_inc.split("\n");
for inc in patch_bug_inc:
print("_________________________________________________")
inc = inc.split(":")
# to get subject part
patch_bug_str_index = [i for i, s in enumerate(inc) if 'INFO' in s][0]
inc_name = inc[patch_bug_str_index+1]
# file name
log_file_name = os.path.basename(inc[0])
# get file path
log_path = os.path.split(inc[0])
print("log_path :", log_path)
full_path = log_path[0]
print("FULL PATH: ", full_path)
这是一种无需调用 grep 即可实现此目的的方法,正如我在评论中所说,它可能不可移植:
import os
import sys
for root, _, files in os.walk('/home/user1/logs/MAIN_JOB'):
for file in files:
if file.endswith('.log'):
path = os.path.join(root, file)
try:
with open(path) as infile:
for line in infile:
if 'INFO:' in line:
print(path)
break
except Exception:
print(f"Unable to process {path}", file=sys.stderr)
可能我会直接用例子来解释:我在 python 中编写我的代码,对于 grep 部分也使用 bash 命令。
我有几个文件,我需要在其中 grep 寻找一些模式,比方说“INFO” 所有这些文件都可以存在两个不同的目录结构:tyep1,type2
- /home/user1/logs/MAIN_JOB/121/patching/a.log (type1)
- /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log (type2)
- /home/user1/logs/MAIN_JOB/SUB_JOB1/142/DB:2/patching/c.log (type2)
文件内容:
a.log :
[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
b.log :
[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
c.log :
[Thu Jan 22 18:01:00 UTC 2022]: database1: ERR: Subject3: This is subject 3.
所以我需要知道哪些文件中存在“INFO”字符串。如果存在,我需要得到以下信息:
文件名:a.log / b.log
文件路径:/home/user1/logs/MAIN_JOB/121/patching 或 /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/补丁
搜索字符串后的直接字符串:Subject1 / Subject2
所以我尝试使用带有 -r 的 grep 命令来了解我能找到哪些文件“INFO”
$ grep -r /home/user1/logs/MAIN_JOB
/home/user1/logs/MAIN_JOB/121/patching/a.log:[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log:[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
$
所以我将存储上面的 grep python 变量并且需要从这个输出中提取上面的东西。
我最初尝试将 grep o/p 与 "\n" 分开,所以我会得到两个单独的行
/home/user1/logs/MAIN_JOB/121/patching/a.log:[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log:[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
通过获取每一行,我可以用“:”分割 第一行:我能够正确拆分,因为“:”在正确的位置。
file_with_path : /home/user1/logs/MAIN_JOB/121/patching/a.log(I can get file name separate with os.path.basename(file_with_path))
immediate str after search word : "Subject1"
第二行:这是我需要帮助的地方,因为在路径中我们有这个“DB:1”,其中有“:”,这会破坏我的正确拆分。如果我分裂我会得到如下
file_with_path : /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB (not correct)
actually should be /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log
我无法在此处应用拆分,因为它在这两种情况下都无法正常工作。
你能帮我解决这个问题吗?任何可以在 bash 或 python 中完成这项工作的命令都会非常有帮助。 先感谢您。如果需要我提供一些信息,也请告诉我。
给出以下代码:
# main dir
patch_log_home = '/home/user1/logs/MAIN_JOB'
cmd = "grep -r 'INFO' {0}"
patch_bug_inc = self._core.exec_os_cmd(cmd.format(patch_log_home))
# if no occurrance reported continue
if len(patch_bug_inc) == 0:
return
if patch_bug_inc:
patch_bug_inc = patch_bug_inc.split("\n");
for inc in patch_bug_inc:
print("_________________________________________________")
inc = inc.split(":")
# to get subject part
patch_bug_str_index = [i for i, s in enumerate(inc) if 'INFO' in s][0]
inc_name = inc[patch_bug_str_index+1]
# file name
log_file_name = os.path.basename(inc[0])
# get file path
log_path = os.path.split(inc[0])
print("log_path :", log_path)
full_path = log_path[0]
print("FULL PATH: ", full_path)
这是一种无需调用 grep 即可实现此目的的方法,正如我在评论中所说,它可能不可移植:
import os
import sys
for root, _, files in os.walk('/home/user1/logs/MAIN_JOB'):
for file in files:
if file.endswith('.log'):
path = os.path.join(root, file)
try:
with open(path) as infile:
for line in infile:
if 'INFO:' in line:
print(path)
break
except Exception:
print(f"Unable to process {path}", file=sys.stderr)