我可以在 bash 中根据上下文排序吗?
Can I sort with context in bash?
当我想合并日志文件时,我经常使用cat logA.log logB.log | sort
。只要日志行以通用格式的类似时间戳的字符串开头,就可以了。
但是我能否以某种方式对行进行排序 并将不(不)遵循特定规则的行粘在其原始引导行 上?想想一个日志文件,有人在其中记录了一些带有换行符的东西(我不知道)!
(berta.log)
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
(caesar.log)
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
at Conversation.parseStatement
at Conversation.considerReplyToStatement
at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
如果合并cat berta.log caesar.log | sort
,这两个日志文件当然会变得无法使用。
我也不确定是否应该 post 这个问题给 Whosebug 或超级用户,甚至是 Unix 或 ServerFault...
编辑清楚
合并后的日志应该看起来像像这样:
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
at Conversation.parseStatement
at Conversation.considerReplyToStatement
at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
经典的行文件混合问题
解决方案:将多行日志行放在一行上
- 可执行脚本:
./onelinelog.awk
#! /usr/bin/awk -f
# Timestamp line
/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] / {
if (log_line != "") { print log_line }
log_line = [=10=]
next
}
# Other line
{
# Here, I use '§' for separate each original lines
log_line = log_line "§" [=10=]
}
# End of file
END {
if (log_line != "") { print log_line }
}
在 caesar.log
文件上测试:
$ ./onelinelog.awk caesar.log
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.§ at Conversation.parseStatement§ at Conversation.considerReplyToStatement§ at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
- 排序:
cat <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log) | sort
或
sort <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log)
输出:
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.§ at Conversation.parseStatement§ at Conversation.considerReplyToStatement§ at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
好玩吗?
您可能想要恢复原来的台词...
使用sed
:
$ cat and/or sort ... | sed -e 's/§/\n/g'
或另一个可执行的 awk 脚本:./tomultilinelog.awk
#! /usr/bin/awk -f
BEGIN {
FS="§"
}
{
for (i = 1; i <= NF; i += 1) { print $i }
}
所以执行:
$ cat <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log) | sort | ./tomultilinelog.awk
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
at Conversation.parseStatement
at Conversation.considerReplyToStatement
at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
当然,您可以修改代码并将“§
”字符替换为另一个标记。
当 Arnaud Valmary 发布了他的解决方案时,我提出了另一个 awk 解决方案。
在我的尝试中,我只是在所有不以时间戳开头的行前加上最后一个时间戳(和一个数字):
prefixAllLines.awk
#! /usr/bin/awk -f
BEGIN {
linePattern="^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}) (.*)"
}
{
if ([=10=]~linePattern){
number=0
linePrefix=gensub(linePattern, "\1", "g", [=10=])
lineRest=gensub(linePattern, "\2", "g", [=10=])
printf linePrefix " "
printf ("%03d", number)
printf " " lineRest "\n"
} else {
number+=1
printf linePrefix " "
printf ("%03d", number)
printf " " [=10=] "\n"
}
}
因此,./prefixAllLines.awk caesar.log
带来:
2021-10-01 00:00:00 000 Hey Berta
2021-10-01 00:00:20 000 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
2021-10-01 00:00:20 001 at Conversation.parseStatement
2021-10-01 00:00:20 002 at Conversation.considerReplyToStatement
2021-10-01 00:00:20 003 at Conversation.doConversation
2021-10-01 00:00:40 000 I am not Adam, I am Caesar!
和cat <(./prefixAllLines.awk caesar.log) <(./prefixAllLines.awk berta.log) | sort
:
2021-10-01 00:00:00 000 Hey Berta
2021-10-01 00:00:10 000 Hey!
2021-10-01 00:00:11 000 How are you doing, Adam?
2021-10-01 00:00:20 000 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
2021-10-01 00:00:20 001 at Conversation.parseStatement
2021-10-01 00:00:20 002 at Conversation.considerReplyToStatement
2021-10-01 00:00:20 003 at Conversation.doConversation
2021-10-01 00:00:40 000 I am not Adam, I am Caesar!
但我更喜欢 Arnaud Valmary 的方法。 :-)
当我想合并日志文件时,我经常使用cat logA.log logB.log | sort
。只要日志行以通用格式的类似时间戳的字符串开头,就可以了。
但是我能否以某种方式对行进行排序 并将不(不)遵循特定规则的行粘在其原始引导行 上?想想一个日志文件,有人在其中记录了一些带有换行符的东西(我不知道)!
(berta.log)
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
(caesar.log)
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
at Conversation.parseStatement
at Conversation.considerReplyToStatement
at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
如果合并cat berta.log caesar.log | sort
,这两个日志文件当然会变得无法使用。
我也不确定是否应该 post 这个问题给 Whosebug 或超级用户,甚至是 Unix 或 ServerFault...
编辑清楚
合并后的日志应该看起来像像这样:
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
at Conversation.parseStatement
at Conversation.considerReplyToStatement
at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
经典的行文件混合问题
解决方案:将多行日志行放在一行上
- 可执行脚本:
./onelinelog.awk
#! /usr/bin/awk -f
# Timestamp line
/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] / {
if (log_line != "") { print log_line }
log_line = [=10=]
next
}
# Other line
{
# Here, I use '§' for separate each original lines
log_line = log_line "§" [=10=]
}
# End of file
END {
if (log_line != "") { print log_line }
}
在 caesar.log
文件上测试:
$ ./onelinelog.awk caesar.log
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.§ at Conversation.parseStatement§ at Conversation.considerReplyToStatement§ at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
- 排序:
cat <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log) | sort
或
sort <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log)
输出:
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.§ at Conversation.parseStatement§ at Conversation.considerReplyToStatement§ at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
好玩吗?
您可能想要恢复原来的台词...
使用sed
:
$ cat and/or sort ... | sed -e 's/§/\n/g'
或另一个可执行的 awk 脚本:./tomultilinelog.awk
#! /usr/bin/awk -f
BEGIN {
FS="§"
}
{
for (i = 1; i <= NF; i += 1) { print $i }
}
所以执行:
$ cat <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log) | sort | ./tomultilinelog.awk
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
at Conversation.parseStatement
at Conversation.considerReplyToStatement
at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
当然,您可以修改代码并将“§
”字符替换为另一个标记。
当 Arnaud Valmary 发布了他的解决方案时,我提出了另一个 awk 解决方案。
在我的尝试中,我只是在所有不以时间戳开头的行前加上最后一个时间戳(和一个数字):
prefixAllLines.awk
#! /usr/bin/awk -f
BEGIN {
linePattern="^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}) (.*)"
}
{
if ([=10=]~linePattern){
number=0
linePrefix=gensub(linePattern, "\1", "g", [=10=])
lineRest=gensub(linePattern, "\2", "g", [=10=])
printf linePrefix " "
printf ("%03d", number)
printf " " lineRest "\n"
} else {
number+=1
printf linePrefix " "
printf ("%03d", number)
printf " " [=10=] "\n"
}
}
因此,./prefixAllLines.awk caesar.log
带来:
2021-10-01 00:00:00 000 Hey Berta
2021-10-01 00:00:20 000 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
2021-10-01 00:00:20 001 at Conversation.parseStatement
2021-10-01 00:00:20 002 at Conversation.considerReplyToStatement
2021-10-01 00:00:20 003 at Conversation.doConversation
2021-10-01 00:00:40 000 I am not Adam, I am Caesar!
和cat <(./prefixAllLines.awk caesar.log) <(./prefixAllLines.awk berta.log) | sort
:
2021-10-01 00:00:00 000 Hey Berta
2021-10-01 00:00:10 000 Hey!
2021-10-01 00:00:11 000 How are you doing, Adam?
2021-10-01 00:00:20 000 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
2021-10-01 00:00:20 001 at Conversation.parseStatement
2021-10-01 00:00:20 002 at Conversation.considerReplyToStatement
2021-10-01 00:00:20 003 at Conversation.doConversation
2021-10-01 00:00:40 000 I am not Adam, I am Caesar!
但我更喜欢 Arnaud Valmary 的方法。 :-)