如果在 UNIX 中不以时间戳开头,则将行连接到上一行 shell
Join line to previous line if it doesn't start with a timestamp in UNIX shell
我有一个工具可以输出带有时间戳前缀的日志,但是日志条目可能包含换行符。我想将任何没有时间戳的行与前一行合并。
示例:
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two
lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist
[19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"
使用 awk 我可以做这样的事情来合并不以 [ 大括号:
开头的行
awk -v RS="[" 'NR>1{=; print RS, [=12=]}'
但是您可以在上面的“twist”行中看到失败的地方。 “扭曲”行以 [ 不是时间戳的一部分开始。
有没有办法为该时间戳前缀使用正则表达式?或者有没有更好的命令行工具来完成这个?
您能否尝试按照现场显示的示例编写并测试https://ideone.com/PXVCh2
awk '
{
printf("%s%s",[=10=]~/^\[ [0-9]{4}\/[0-9]{2}\/[0-9]{2}/\
?(FNR!=1?ORS:""):OFS,[=10=])
}
END{ print "" }
' Input_file
根据 Ed 先生的评论,添加了打印新行语句以在 Input_file 的最后添加一个新行,以防万一它已经这样做了,然后可以省略该部分。
注意:我是在手机上写的;对不起,我无法判断它在大屏幕上看起来如何明智,所以我在这里将一行打印成两行
在我看来,您真正的问题实际上是您引用的字符串可以包含换行符,因此这个用于查找引用字符串的 GNU awk 解决方案(针对 multi-char RS)可能比查找时间戳更可靠在行的开头:
$ awk -v RS='"[^"]*"' '{gsub("\n"," ",RT); ORS=RT} 1' file
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist [19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"
如果引用的字符串可以包含可能出现在行首的时间戳,那么这比检查以时间戳开头的行要好,例如(注意 "four lines with a twist...
块中的时间戳):
$ cat file
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two
lines [13]"
[ 2020/08/12 11:40] Success with "four lines with a twist
[ 2020/08/12 11:40] to confuse you
repeatedly and
in ""horrible"" ways"
[ 2020/08/12 11:41] Failure with "one line again"
.
$ awk -v RS='"[^"]*"' '{ORS=gensub("\n"," ","g",RT)} 1' file
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two lines [13]"
[ 2020/08/12 11:40] Success with "four lines with a twist [ 2020/08/12 11:40] to confuse you repeatedly and in ""horrible"" ways"
[ 2020/08/12 11:41] Failure with "one line again"
假设日志包含您的示例文件:
$ cat log
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two
lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist
[19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"
以下代码检查双引号 (") 的数量,如果只找到一个双引号,则将两行连接起来:
$ gawk 'gsub("\"", "\"") == 1 {x=[=11=]; getline; print x " " [=11=];} gsub("\"", "\"") == 2 {print}' log
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist [19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"
我有一个工具可以输出带有时间戳前缀的日志,但是日志条目可能包含换行符。我想将任何没有时间戳的行与前一行合并。
示例:
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two
lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist
[19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"
使用 awk 我可以做这样的事情来合并不以 [ 大括号:
开头的行awk -v RS="[" 'NR>1{=; print RS, [=12=]}'
但是您可以在上面的“twist”行中看到失败的地方。 “扭曲”行以 [ 不是时间戳的一部分开始。
有没有办法为该时间戳前缀使用正则表达式?或者有没有更好的命令行工具来完成这个?
您能否尝试按照现场显示的示例编写并测试https://ideone.com/PXVCh2
awk '
{
printf("%s%s",[=10=]~/^\[ [0-9]{4}\/[0-9]{2}\/[0-9]{2}/\
?(FNR!=1?ORS:""):OFS,[=10=])
}
END{ print "" }
' Input_file
根据 Ed 先生的评论,添加了打印新行语句以在 Input_file 的最后添加一个新行,以防万一它已经这样做了,然后可以省略该部分。
注意:我是在手机上写的;对不起,我无法判断它在大屏幕上看起来如何明智,所以我在这里将一行打印成两行
在我看来,您真正的问题实际上是您引用的字符串可以包含换行符,因此这个用于查找引用字符串的 GNU awk 解决方案(针对 multi-char RS)可能比查找时间戳更可靠在行的开头:
$ awk -v RS='"[^"]*"' '{gsub("\n"," ",RT); ORS=RT} 1' file
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist [19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"
如果引用的字符串可以包含可能出现在行首的时间戳,那么这比检查以时间戳开头的行要好,例如(注意 "four lines with a twist...
块中的时间戳):
$ cat file
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two
lines [13]"
[ 2020/08/12 11:40] Success with "four lines with a twist
[ 2020/08/12 11:40] to confuse you
repeatedly and
in ""horrible"" ways"
[ 2020/08/12 11:41] Failure with "one line again"
.
$ awk -v RS='"[^"]*"' '{ORS=gensub("\n"," ","g",RT)} 1' file
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two lines [13]"
[ 2020/08/12 11:40] Success with "four lines with a twist [ 2020/08/12 11:40] to confuse you repeatedly and in ""horrible"" ways"
[ 2020/08/12 11:41] Failure with "one line again"
假设日志包含您的示例文件:
$ cat log
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two
lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist
[19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"
以下代码检查双引号 (") 的数量,如果只找到一个双引号,则将两行连接起来:
$ gawk 'gsub("\"", "\"") == 1 {x=[=11=]; getline; print x " " [=11=];} gsub("\"", "\"") == 2 {print}' log
[ 2020/08/12 11:40] Success with "one line [42]"
[ 2020/08/12 11:40] Success with "two lines [13]"
[ 2020/08/12 11:40] Success with "two lines with a twist [19] to confuse you"
[ 2020/08/12 11:41] Failure with "one line again"