我可以在 bash 中根据上下文排序吗?

Can I sort with context in bash?

当我想合并日志文件时,我经常使用cat logA.log logB.log | sort。只要日志行以通用格式的类似时间戳的字符串开头,就可以了。

但是我能否以某种方式对行进行排序 并将不(不)遵循特定规则的行粘在其原始引导行 上?想想一个日志文件,有人在其中记录了一些带有换行符的东西(我不知道)!

(berta.log)
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?

(caesar.log)
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
    at Conversation.parseStatement
    at Conversation.considerReplyToStatement
    at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!

如果合并cat berta.log caesar.log | sort,这两个日志文件当然会变得无法使用。

我也不确定是否应该 post 这个问题给 Whosebug 或超级用户,甚至是 Unix 或 ServerFault...

编辑清楚

合并后的日志应该看起来像像这样:

2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
    at Conversation.parseStatement
    at Conversation.considerReplyToStatement
    at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!

经典的行文件混合问题

解决方案:将多行日志行放在一行上

  1. 可执行脚本:./onelinelog.awk
#! /usr/bin/awk -f

# Timestamp line
/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]:[0-9][0-9]:[0-9][0-9] / {
    if (log_line != "") { print log_line }
    log_line = [=10=]
    next
}
# Other line
{
    # Here, I use '§' for separate each original lines
    log_line = log_line "§" [=10=]
}
# End of file
END {
    if (log_line != "") { print log_line }
}

caesar.log 文件上测试:

$ ./onelinelog.awk caesar.log 
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.§    at Conversation.parseStatement§    at Conversation.considerReplyToStatement§    at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!
  1. 排序:
cat <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log) | sort

sort <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log)

输出:

2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.§    at Conversation.parseStatement§    at Conversation.considerReplyToStatement§    at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!

好玩吗?

您可能想要恢复原来的台词...

使用sed:

$ cat and/or sort ... | sed -e 's/§/\n/g'

或另一个可执行的 awk 脚本:./tomultilinelog.awk

#! /usr/bin/awk -f
BEGIN {
    FS="§"
}
{
    for (i = 1; i <= NF; i += 1) { print $i }
}

所以执行:

$ cat <(./onelinelog.awk caesar.log) <(./onelinelog.awk berta.log) | sort | ./tomultilinelog.awk 
2021-10-01 00:00:00 Hey Berta
2021-10-01 00:00:10 Hey!
2021-10-01 00:00:11 How are you doing, Adam?
2021-10-01 00:00:20 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
    at Conversation.parseStatement
    at Conversation.considerReplyToStatement
    at Conversation.doConversation
2021-10-01 00:00:40 I am not Adam, I am Caesar!

当然,您可以修改代码并将“§”字符替换为另一个标记。

当 Arnaud Valmary 发布了他的解决方案时,我提出了另一个 awk 解决方案。

在我的尝试中,我只是在所有不以时间戳开头的行前加上最后一个时间戳(和一个数字):

prefixAllLines.awk

#! /usr/bin/awk -f

BEGIN { 
    linePattern="^([0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}) (.*)" 
}
{ 
    if ([=10=]~linePattern){
        number=0
        linePrefix=gensub(linePattern, "\1", "g", [=10=])
        lineRest=gensub(linePattern, "\2", "g", [=10=])
        printf linePrefix " " 
        printf ("%03d", number)
        printf " " lineRest "\n"
    } else {
        number+=1
        printf linePrefix " " 
        printf ("%03d", number)
        printf " " [=10=] "\n"
    }
}

因此,./prefixAllLines.awk caesar.log 带来:

2021-10-01 00:00:00 000 Hey Berta
2021-10-01 00:00:20 000 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
2021-10-01 00:00:20 001         at Conversation.parseStatement
2021-10-01 00:00:20 002         at Conversation.considerReplyToStatement
2021-10-01 00:00:20 003         at Conversation.doConversation
2021-10-01 00:00:40 000 I am not Adam, I am Caesar!

cat <(./prefixAllLines.awk caesar.log) <(./prefixAllLines.awk berta.log) | sort

2021-10-01 00:00:00 000 Hey Berta
2021-10-01 00:00:10 000 Hey!
2021-10-01 00:00:11 000 How are you doing, Adam?
2021-10-01 00:00:20 000 Error: SomebodyCalledMeWithTheWrongNameException: I am not Adam.
2021-10-01 00:00:20 001         at Conversation.parseStatement
2021-10-01 00:00:20 002         at Conversation.considerReplyToStatement
2021-10-01 00:00:20 003         at Conversation.doConversation
2021-10-01 00:00:40 000 I am not Adam, I am Caesar!

但我更喜欢 Arnaud Valmary 的方法。 :-)