如何从访问日志中提取文本?
How to extract text from access log?
我对此很陌生。我正在尝试从新文件中的访问日志中提取一些文本。
我的日志文件是这样的:
111.111.111.111 - - [02/Jul/2021:18:35:19 +0000] "GET /api/items HTTP/2.0" 304 0 "https://example.com/some/text/call-log?roomNo=5003" "Mozilla etc etc etc etc"
111.111.111.111 - - [02/Jul/2021:20:35:19 +0000] "GET /api/items HTTP/2.0" 304 0 "https://example.com/some/text/resevation-log?roomNo=4003" "Mozilla etc etc etc etc"
我想在新文件中以下面的格式提取。
02/Jul/2021:18:35:19 +0000, call-log, 5003
02/Jul/2021:20:35:19 +0000, resevation-log, 4003
到目前为止,我已经设法完成了这个基本的 awk 命令:
awk '{print ,,",",}' < /file.log
这给了我以下输出:
[02/Jul/2021:18:35:19 +0000] , "https://example.com/some/text/call-log?roomNo=5003"
$ cat tst.awk
BEGIN {
FS="[[:space:]]*[][\"][[:space:]]*"
OFS = ", "
}
{
n = split(,f,"[/?=]")
print , f[n-2], f[n]
}
$ awk -f tst.awk file
02/Jul/2021:18:35:19 +0000, call-log, 5003
02/Jul/2021:20:35:19 +0000, resevation-log, 4003
以上使用以下方式将问题中的输入拆分为使用任何 POSIX awk 的字段:
$ cat tst.awk
BEGIN {
FS="[[:space:]]*[][\"][[:space:]]*"
OFS = ","
}
{
print
for (i=1; i<=NF; i++) {
print "\t" i, "<" $i ">"
}
print "-----"
}
$ awk -f tst.awk file
111.111.111.111 - - [02/Jul/2021:18:35:19 +0000] "GET /api/items HTTP/2.0" 304 0 "https://example.com/some/text/call-log?roomNo=5003" "Mozilla etc etc etc etc"
1,<111.111.111.111 - ->
2,<02/Jul/2021:18:35:19 +0000>
3,<>
4,<GET /api/items HTTP/2.0>
5,<304 0>
6,<https://example.com/some/text/call-log?roomNo=5003>
7,<>
8,<Mozilla etc etc etc etc>
9,<>
-----
111.111.111.111 - - [02/Jul/2021:20:35:19 +0000] "GET /api/items HTTP/2.0" 304 0 "https://example.com/some/text/resevation-log?roomNo=4003" "Mozilla etc etc etc etc"
1,<111.111.111.111 - ->
2,<02/Jul/2021:20:35:19 +0000>
3,<>
4,<GET /api/items HTTP/2.0>
5,<304 0>
6,<https://example.com/some/text/resevation-log?roomNo=4003>
7,<>
8,<Mozilla etc etc etc etc>
9,<>
-----
如果您引用的任何字段可以包含 [
、]
或您的示例中存在的转义 "
、none,但如果它们可能会发生,然后将它们包含在您问题的示例中。
这个awk
可以提取文本:
awk -v FS='[][/?="]' -v OFS=',' '{print "/""/",,}' file
02/Jul/2021:18:35:19 +0000,call-log,5003
02/Jul/2021:20:35:19 +0000,resevation-log,4003
使用 AWK 执行此操作的另一种方法是:
awk '{split(, A, /\/+|"|(\?roomNo=)/); print substr(, 2), substr(, 1, 5) ",", A[6] ",", A[7]}' file.log >> newFile.log
第一部分是使用正则表达式将 URL 字段拆分为一个数组,
然后打印特定字段和数组值
最后将日志存储到另一个名为 newFile.log
的文件中
我对此很陌生。我正在尝试从新文件中的访问日志中提取一些文本。
我的日志文件是这样的:
111.111.111.111 - - [02/Jul/2021:18:35:19 +0000] "GET /api/items HTTP/2.0" 304 0 "https://example.com/some/text/call-log?roomNo=5003" "Mozilla etc etc etc etc"
111.111.111.111 - - [02/Jul/2021:20:35:19 +0000] "GET /api/items HTTP/2.0" 304 0 "https://example.com/some/text/resevation-log?roomNo=4003" "Mozilla etc etc etc etc"
我想在新文件中以下面的格式提取。
02/Jul/2021:18:35:19 +0000, call-log, 5003
02/Jul/2021:20:35:19 +0000, resevation-log, 4003
到目前为止,我已经设法完成了这个基本的 awk 命令:
awk '{print ,,",",}' < /file.log
这给了我以下输出:
[02/Jul/2021:18:35:19 +0000] , "https://example.com/some/text/call-log?roomNo=5003"
$ cat tst.awk
BEGIN {
FS="[[:space:]]*[][\"][[:space:]]*"
OFS = ", "
}
{
n = split(,f,"[/?=]")
print , f[n-2], f[n]
}
$ awk -f tst.awk file
02/Jul/2021:18:35:19 +0000, call-log, 5003
02/Jul/2021:20:35:19 +0000, resevation-log, 4003
以上使用以下方式将问题中的输入拆分为使用任何 POSIX awk 的字段:
$ cat tst.awk
BEGIN {
FS="[[:space:]]*[][\"][[:space:]]*"
OFS = ","
}
{
print
for (i=1; i<=NF; i++) {
print "\t" i, "<" $i ">"
}
print "-----"
}
$ awk -f tst.awk file
111.111.111.111 - - [02/Jul/2021:18:35:19 +0000] "GET /api/items HTTP/2.0" 304 0 "https://example.com/some/text/call-log?roomNo=5003" "Mozilla etc etc etc etc"
1,<111.111.111.111 - ->
2,<02/Jul/2021:18:35:19 +0000>
3,<>
4,<GET /api/items HTTP/2.0>
5,<304 0>
6,<https://example.com/some/text/call-log?roomNo=5003>
7,<>
8,<Mozilla etc etc etc etc>
9,<>
-----
111.111.111.111 - - [02/Jul/2021:20:35:19 +0000] "GET /api/items HTTP/2.0" 304 0 "https://example.com/some/text/resevation-log?roomNo=4003" "Mozilla etc etc etc etc"
1,<111.111.111.111 - ->
2,<02/Jul/2021:20:35:19 +0000>
3,<>
4,<GET /api/items HTTP/2.0>
5,<304 0>
6,<https://example.com/some/text/resevation-log?roomNo=4003>
7,<>
8,<Mozilla etc etc etc etc>
9,<>
-----
如果您引用的任何字段可以包含 [
、]
或您的示例中存在的转义 "
、none,但如果它们可能会发生,然后将它们包含在您问题的示例中。
这个awk
可以提取文本:
awk -v FS='[][/?="]' -v OFS=',' '{print "/""/",,}' file
02/Jul/2021:18:35:19 +0000,call-log,5003
02/Jul/2021:20:35:19 +0000,resevation-log,4003
使用 AWK 执行此操作的另一种方法是:
awk '{split(, A, /\/+|"|(\?roomNo=)/); print substr(, 2), substr(, 1, 5) ",", A[6] ",", A[7]}' file.log >> newFile.log
第一部分是使用正则表达式将 URL 字段拆分为一个数组,
然后打印特定字段和数组值
最后将日志存储到另一个名为 newFile.log