使用 Splunk 解析 IBM MQ v9.1 错误日志

Parse IBM MQ v9.1 Error Logs using Splunk

我正在使用 splunk 转发器将我的 IBM MQ v9.1 错误日志转发到集中式集群,以查看我的分布式消息传递系统中发生的常见错误的趋势。

但是我无法解析所需的字段,因为 MQ 错误日志的格式各不相同,即消息的严重性可能是错误、警告、信息、严重和终止,并且每个消息在中都有不同的字段集本身并不一致。

如果有人在 splunk 中使用正则表达式来解析 v9.1 的 IBM MQ 错误日志字段,请告诉我。

我尝试了几种正则表达式模式,但没有按预期进行解析。

我已经在下面提到 link,但那是针对 v8 的,而 v9 的错误日志格式有所不同, https://t-rob.net/2017/12/18/parsing-mq-error-logs-in-splunk/

此外,splunk 用户无法访问错误日志。我在 qm.ini 中更新了以下节 文件系统: 验证验证=否

还将 chmod -R 755 设置为 /var/mqm/qmgrs/qmName/errors 文件夹。

虽然 ERROR 日志的权限在更新时不会改变,但当日志轮换时,权限将被撤销,splunk 用户将无法读取日志。

请告诉我如何在不将 splunk 用户添加到 mqm 组的情况下解决这个问题

我建议启用 JSON 日志记录并将这些日志转发到应该能够解析此格式的 Splunk。

在 IBM MQ v9.0.4 CDS 版本中,IBM 添加了注销到 JSON 格式日志的功能,即使您启用 JSON 记录。这包含在所有 MQ 9.1 LTS 和 CSD 版本中。

IBM MQ v9.1 知识中心页面 IBM MQ>Configuring>Changing IBM MQ and queue manager configuration information>Attributes for changing queue manager configuration information>Diagnostic message logging>Diagnostic message service stanzas>Diagnostic message services 包含有关该主题的信息。您可以将以下内容添加到您的 qm.ini 以使其将日志信息输出到标准队列管理器 errors 目录中名为 AMQERR0x.json 的 JSON 格式文件:

DiagnosticMessages:
   Service = File
   Name = JSONLogs
   Format = json
   FilePrefix = AMQERR

如 OP 所述,JSON 格式的日志不包含您在普通日志中看到的 EXPLANATIONACTION 部分。


在 IBM MQ v9.1 中,您可以使用 mqrc 命令将 JSON 格式转换为您在 AMQERR01.LOG.

中看到的熟悉格式

下面是一个简单的例子:

cat <<EOL |mqrc -i json -o text -
{"ibm_messageId":"AMQ9209E","ibm_arithInsert1":0,"ibm_arithInsert2":0,"ibm_commentInsert1":"localhost (127.0.0.1)","ibm_commentInsert2":"TCP/IP","ibm_commentInsert3":"SYSTEM.DEF.SVRCONN","ibm_datetime":"2018-02-22T06:54:53.942Z","ibm_serverName":"QM1","type":"mq_log","host":"0df0ce19c711","loglevel":"ERROR","module":"amqccita.c:4214","ibm_sequence":"1519282493_947814358","ibm_remoteHost":"127.0.0.1","ibm_qmgrId":"QM1_2018-02-13_10.49.57","ibm_processId":4927,"ibm_threadId":4,"ibm_version":"9.1.0.5","ibm_processName":"amqrmppa","ibm_userName":"johndoe","ibm_installationName":"Installation1","ibm_installationDir":"/opt/mqm","message":"AMQ9209E: Connection to host 'localhost (127.0.0.1)' for channel 'SYSTEM.DEF.SVRCONN' closed."}
EOL

输出将是:

02/22/2018 06:54:53 AM - User(johndoe) Program(amqrmppa)
                    Host(0df0ce19c711) Installation(Installation1)
                    VRMF(9.1.0.5) QMgr(QM1)
                    Time(2018-02-22T11:54:53.942Z)
                    RemoteHost(127.0.0.1)
                    CommentInsert1(localhost (127.0.0.1))
                    CommentInsert2(TCP/IP)
                    CommentInsert3(SYSTEM.DEF.SVRCONN)

AMQ9209E: Connection to host 'localhost (127.0.0.1)' for channel
'SYSTEM.DEF.SVRCONN' closed.

EXPLANATION:
An error occurred receiving data from 'localhost (127.0.0.1)' over TCP/IP.  The
connection to the remote host has unexpectedly terminated.

The channel name is 'SYSTEM.DEF.SVRCONN'; in some cases it cannot be determined
and so is shown as '????'.
ACTION:
Tell the systems administrator.
----- amqccita.c : 4214 -------------------------------------------------------

您也可以使用 mqrc 和来自 JSON 的错误消息,例如 AMQ9209E,您可以 运行 这样的命令:

mqrc AMQ9209E

输出将是:

 536908297  0x20009209  rrcE_CONNECTION_CLOSED
 536908297  0x20009209  urcMS_CONN_CLOSED

MESSAGE:
Connection to host '<insert one>' for channel '<insert three>' closed.

EXPLANATION:
An error occurred receiving data from '<insert one>' over <insert two>.  The
connection to the remote host has unexpectedly terminated.

The channel name is '<insert three>'; in some cases it cannot be determined and
so is shown as '????'.

ACTION:
Tell the systems administrator.

您可以更进一步并指定 JSON:

中的插入

JSON 日志的示例部分:

"ibm_messageId":"AMQ9209E","ibm_arithInsert1":0,"ibm_arithInsert2":0,"ibm_commentInsert1":"localhost (127.0.0.1)","ibm_commentInsert2":"TCP/IP","ibm_commentInsert3":"SYSTEM.DEF.SVRCONN"

在下面的命令中,每个 ibm_arthInsert 都指定了一个 -n 标志,然后每个 ibm_commentInsert 都带有一个 -c 标志:

mqrc AMQ9209E -n 0 -n 0 -c "localhost (127.0.0.1)" -c "TCP/IP" -c "SYSTEM.DEF.SVRCONN"

输出如下:

 536908297  0x20009209  rrcE_CONNECTION_CLOSED
 536908297  0x20009209  urcMS_CONN_CLOSED

MESSAGE:
Connection to host 'localhost (127.0.0.1)' for channel 'SYSTEM.DEF.SVRCONN'
closed.

EXPLANATION:
An error occurred receiving data from 'localhost (127.0.0.1)' over TCP/IP.  The
connection to the remote host has unexpectedly terminated.

The channel name is 'SYSTEM.DEF.SVRCONN'; in some cases it cannot be determined
and so is shown as '????'.

ACTION:
Tell the systems administrator.