Grep 并提取多个日志文件中的特定数据

Question

我在一个目录中有多个日志文件，并试图只提取时间戳和日志行的一部分，即全文查询参数的值。请求中的每个查询参数都由符号 (&) 分隔，如下所示。

输入

30/Mar/2022:00:27:36 +0000 [59823] -> GET /libs/granite/omnisearch?p.guessTotal=1000&fulltext=798&savedSearches%40Delete=&

31/Mar/2022:00:27:36 +0000 [59823] -> GET /libs/granite/omnisearch?p.guessTotal=1000&fulltext=Dyson+V7&savedSearches%40Delete=&

预期输出

30/Mar/2022:00:27:36 -> 798

31/Mar/2022:00:27:36 -> Dyson+V7

我有这个命令递归搜索目录中的所有文件。

grep -rn "/libs/granite/omnisearch" ~/Downloads/ReqLogs/ > output.txt

这会打印以目录名称开头的整个日志行，就像这样

/Users/****/Downloads/ReqLogs/logfile1_2022-03-31.log:6020:31/Mar/2022:00:27:36 +0000 [59823] -> GET /libs/granite/omnisearch?p.guessTotal=1000&fulltext=798&savedSearches%4

请指教，我该如何操作才能达到预期的输出。

Answer 1

grep可以return整行或匹配的字符串。要从匹配行中提取不同的数据，请转向 sed 或 Awk。

awk -v search="/libs/granite/omnisearch" '[=10=] ~ search { s = [=10=]; sub(/.*fulltext=/, "", s); sub(/&.*/, "", s); print , s }' ~/Downloads/ReqLogs/*

或

sed -n '\%/libs/granite/omnisearch%s/ .*fulltext=\([^&]*\)&.*//p' ~/Downloads/ReqLogs/*

sed 版本更简洁，但也更隐晦。

\%...% 使用备用分隔符 % 以便我们可以在搜索表达式中使用文字斜杠。

然后 s/ ...//p 说要替换第一个 space 之后匹配行上的所有内容，捕获 fulltext= 和 & 之间的任何内容，并替换为捕获的子字符串，然后打印结果行。

-n 标志关闭默认打印操作，这样我们就只打印搜索表达式匹配的行。

通配符~/Downloads/ReqLogs/*匹配该目录下的所有文件；如果你真的也需要遍历子目录，也许添加 find 到组合中。

find ~/Downloads/ReqLogs -type f -exec sed -n '\%/libs/granite/omnisearch%s/ .*fulltext=\([^&]*\)&.*//p' {} +

或与 -exec 之后的 Awk 命令类似。占位符 {} 告诉 find 在何处添加找到的文件的名称， + 表示一次放置尽可能多的文件，而不是运行为每个找到的文件单独 -exec。（如果需要，请使用 \; 而不是 +。）

Grep 并提取多个日志文件中的特定数据

Grep and extract specific data in multiple log files

grep

sed