如果在 UNIX 中不以时间戳开头，则将行连接到上一行 shell

Question

我有一个工具可以输出带有时间戳前缀的日志，但是日志条目可能包含换行符。我想将任何没有时间戳的行与前一行合并。

示例：

[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two
lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist
[19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"

使用 awk 我可以做这样的事情来合并不以 [ 大括号:

开头的行

awk -v RS="[" 'NR>1{=; print RS, [=12=]}'

但是您可以在上面的“twist”行中看到失败的地方。 “扭曲”行以 [ 不是时间戳的一部分开始。

有没有办法为该时间戳前缀使用正则表达式？或者有没有更好的命令行工具来完成这个？

Answer 1

您能否尝试按照现场显示的示例编写并测试https://ideone.com/PXVCh2

awk '
{
  printf("%s%s",[=10=]~/^\[ [0-9]{4}\/[0-9]{2}\/[0-9]{2}/\
          ?(FNR!=1?ORS:""):OFS,[=10=])
}
END{ print "" }
' Input_file

根据 Ed 先生的评论，添加了打印新行语句以在 Input_file 的最后添加一个新行，以防万一它已经这样做了，然后可以省略该部分。

注意：我是在手机上写的；对不起，我无法判断它在大屏幕上看起来如何明智，所以我在这里将一行打印成两行

Answer 2

在我看来，您真正的问题实际上是您引用的字符串可以包含换行符，因此这个用于查找引用字符串的 GNU awk 解决方案（针对 multi-char RS）可能比查找时间戳更可靠在行的开头：

$ awk -v RS='"[^"]*"' '{gsub("\n"," ",RT); ORS=RT} 1' file
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist [19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"

如果引用的字符串可以包含可能出现在行首的时间戳，那么这比检查以时间戳开头的行要好，例如（注意 "four lines with a twist... 块中的时间戳）：

$ cat file
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two
lines [13]"
[ 2020/08/12 11:40] Success with "four lines with a twist
[ 2020/08/12 11:40] to confuse you
repeatedly and
in ""horrible"" ways"
[ 2020/08/12 11:41] Failure with "one line again"

.

$ awk -v RS='"[^"]*"' '{ORS=gensub("\n"," ","g",RT)} 1' file
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two lines [13]"
[ 2020/08/12 11:40] Success with "four lines with a twist [ 2020/08/12 11:40] to confuse you repeatedly and in ""horrible"" ways"
[ 2020/08/12 11:41] Failure with "one line again"

Answer 3

假设日志包含您的示例文件：

$ cat log

[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two
lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist
[19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"

以下代码检查双引号 (") 的数量，如果只找到一个双引号，则将两行连接起来：

$ gawk 'gsub("\"", "\"") == 1 {x=[=11=]; getline; print x " " [=11=];} gsub("\"", "\"") == 2 {print}' log

[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist [19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"

如果在 UNIX 中不以时间戳开头，则将行连接到上一行 shell

Join line to previous line if it doesn't start with a timestamp in UNIX shell

regex

awk

text-processing