按天拆分 ~200mb log4j 日志文件

Split ~200mb log4j log file by day

我有一个格式如下的日志文件,我想按天将它分成多个文件(即 log-2017-10-2、log-2017-10-3 等)。我见过人们用 awk 来做,但我不确定如何处理堆栈跟踪,因为 java.io.Exception 是一个新行。有什么方便的方法可以实现吗?

    2017-10-02 04:26:02,534 INFO XXXXXXXXXXXXXXXXX
    2017-10-03 04:26:02,543 INFO XXXXXXXXXXXX
    2017-10-04 04:26:02,544 INFO XXXXXXXXX
    2017-10-04 04:26:02,546 INFO XXXXXXXXXXXXX
    2017-10-04 04:26:02,549 INFO XXXXXXXXXXX
    2017-10-04 04:53:02,787 WARN class.class.class: [FetcherXXXXXX], Error in fetch XXXXXXXXXXXXXXXXXXXXXX
    java.io.IOException: Connection to X was disconnected before the response was read
            at XXXXXXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXX
    2017-10-05 04:26:02,549 INFO XXXXXXXXXXX

最终文件内容为:

log-2017-10-2:
2017-10-02 04:26:02,534 INFO XXXXXXXXXXXXXXXXX


log-2017-10-3:
2017-10-03 04:26:02,543 INFO XXXXXXXXXXXX

log-2017-10-4:
2017-10-04 04:26:02,544 INFO XXXXXXXXX
    2017-10-04 04:26:02,546 INFO XXXXXXXXXXXXX
    2017-10-04 04:26:02,549 INFO XXXXXXXXXXX
    2017-10-04 04:53:02,787 WARN class.class.class: [FetcherXXXXXX], Error in fetch XXXXXXXXXXXXXXXXXXXXXX
    java.io.IOException: Connection to X was disconnected before the response was read
            at XXXXXXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXX
            at XXXXXXXXXXXXXXXX

log-2017-10-5:
2017-10-05 04:26:02,549 INFO XXXXXXXXXXX

awk 救援!

$ awk --posix 'BEGIN{f="log-header"} 
     ~/^[0-9]{4}-[0-9]{2}-[0-9]{2}$/{f="log-"} {print > f}' log

如果日期太多(对应打开的文件太多),您可能需要一次性关闭文件。对于几百个,它应该按原样工作。

设置初始日志文件 (log-header) 以防您的日志不是以选中的正则表达式开头。

awk解法:

awk '/^[0-9]{4}-[0-9]{2}-[0-9]{2} /{ 
         if (fn && !a[]++) close(fn);
         fn="log-" 
     }{ print > fn }' logfile
  • /^[0-9]{4}-[0-9]{2}-[0-9]{2} / - 在遇到以日期字符串
  • 开头的行时
  • if(fn && !a[]++) close(fn) - 为前一个 "date"
  • 关闭前一个打开的文件描述符
  • fn="log-" - 构造文件名

查看结果:

$ head log-*
==> log-2017-10-02 <==
2017-10-02 04:26:02,534 INFO XXXXXXXXXXXXXXXXX

==> log-2017-10-03 <==
2017-10-03 04:26:02,543 INFO XXXXXXXXXXXX

==> log-2017-10-04 <==
2017-10-04 04:26:02,544 INFO XXXXXXXXX
2017-10-04 04:26:02,546 INFO XXXXXXXXXXXXX
2017-10-04 04:26:02,549 INFO XXXXXXXXXXX
2017-10-04 04:53:02,787 WARN class.class.class: [FetcherXXXXXX], Error in fetch XXXXXXXXXXXXXXXXXXXXXX
java.io.IOException: Connection to X was disconnected before the response was read
        &XXXXXXXXXXXXXXXXXXXX
        &XXXXXXXXXXXXXXXXXXXX
        &XXXXXXXXXXXXXXXXXXXXX
        &XXXXXXXXXXXXXXXX
        &XXXXXXXXXXXXXXXX

==> log-2017-10-05 <==
2017-10-05 04:26:02,549 INFO XXXXXXXXXXX