使用 Splunk 解析 IBM MQ v9.1 错误日志
Parse IBM MQ v9.1 Error Logs using Splunk
我正在使用 splunk 转发器将我的 IBM MQ v9.1 错误日志转发到集中式集群,以查看我的分布式消息传递系统中发生的常见错误的趋势。
但是我无法解析所需的字段,因为 MQ 错误日志的格式各不相同,即消息的严重性可能是错误、警告、信息、严重和终止,并且每个消息在中都有不同的字段集本身并不一致。
如果有人在 splunk 中使用正则表达式来解析 v9.1 的 IBM MQ 错误日志字段,请告诉我。
我尝试了几种正则表达式模式,但没有按预期进行解析。
我已经在下面提到 link,但那是针对 v8 的,而 v9 的错误日志格式有所不同,
https://t-rob.net/2017/12/18/parsing-mq-error-logs-in-splunk/
此外,splunk 用户无法访问错误日志。我在 qm.ini 中更新了以下节
文件系统:
验证验证=否
还将 chmod -R 755 设置为 /var/mqm/qmgrs/qmName/errors 文件夹。
虽然 ERROR 日志的权限在更新时不会改变,但当日志轮换时,权限将被撤销,splunk 用户将无法读取日志。
请告诉我如何在不将 splunk 用户添加到 mqm 组的情况下解决这个问题
我建议启用 JSON 日志记录并将这些日志转发到应该能够解析此格式的 Splunk。
在 IBM MQ v9.0.4 CDS 版本中,IBM 添加了注销到 JSON 格式日志的功能,即使您启用 JSON 记录。这包含在所有 MQ 9.1 LTS 和 CSD 版本中。
IBM MQ v9.1 知识中心页面 IBM MQ>Configuring>Changing IBM MQ and queue manager configuration information>Attributes for changing queue manager configuration information>Diagnostic message logging>Diagnostic message service stanzas>Diagnostic message services 包含有关该主题的信息。您可以将以下内容添加到您的 qm.ini
以使其将日志信息输出到标准队列管理器 errors
目录中名为 AMQERR0x.json
的 JSON 格式文件:
DiagnosticMessages:
Service = File
Name = JSONLogs
Format = json
FilePrefix = AMQERR
如 OP 所述,JSON 格式的日志不包含您在普通日志中看到的 EXPLANATION
或 ACTION
部分。
在 IBM MQ v9.1 中,您可以使用 mqrc
命令将 JSON 格式转换为您在 AMQERR01.LOG
.
中看到的熟悉格式
下面是一个简单的例子:
cat <<EOL |mqrc -i json -o text -
{"ibm_messageId":"AMQ9209E","ibm_arithInsert1":0,"ibm_arithInsert2":0,"ibm_commentInsert1":"localhost (127.0.0.1)","ibm_commentInsert2":"TCP/IP","ibm_commentInsert3":"SYSTEM.DEF.SVRCONN","ibm_datetime":"2018-02-22T06:54:53.942Z","ibm_serverName":"QM1","type":"mq_log","host":"0df0ce19c711","loglevel":"ERROR","module":"amqccita.c:4214","ibm_sequence":"1519282493_947814358","ibm_remoteHost":"127.0.0.1","ibm_qmgrId":"QM1_2018-02-13_10.49.57","ibm_processId":4927,"ibm_threadId":4,"ibm_version":"9.1.0.5","ibm_processName":"amqrmppa","ibm_userName":"johndoe","ibm_installationName":"Installation1","ibm_installationDir":"/opt/mqm","message":"AMQ9209E: Connection to host 'localhost (127.0.0.1)' for channel 'SYSTEM.DEF.SVRCONN' closed."}
EOL
输出将是:
02/22/2018 06:54:53 AM - User(johndoe) Program(amqrmppa)
Host(0df0ce19c711) Installation(Installation1)
VRMF(9.1.0.5) QMgr(QM1)
Time(2018-02-22T11:54:53.942Z)
RemoteHost(127.0.0.1)
CommentInsert1(localhost (127.0.0.1))
CommentInsert2(TCP/IP)
CommentInsert3(SYSTEM.DEF.SVRCONN)
AMQ9209E: Connection to host 'localhost (127.0.0.1)' for channel
'SYSTEM.DEF.SVRCONN' closed.
EXPLANATION:
An error occurred receiving data from 'localhost (127.0.0.1)' over TCP/IP. The
connection to the remote host has unexpectedly terminated.
The channel name is 'SYSTEM.DEF.SVRCONN'; in some cases it cannot be determined
and so is shown as '????'.
ACTION:
Tell the systems administrator.
----- amqccita.c : 4214 -------------------------------------------------------
您也可以使用 mqrc
和来自 JSON 的错误消息,例如 AMQ9209E
,您可以 运行 这样的命令:
mqrc AMQ9209E
输出将是:
536908297 0x20009209 rrcE_CONNECTION_CLOSED
536908297 0x20009209 urcMS_CONN_CLOSED
MESSAGE:
Connection to host '<insert one>' for channel '<insert three>' closed.
EXPLANATION:
An error occurred receiving data from '<insert one>' over <insert two>. The
connection to the remote host has unexpectedly terminated.
The channel name is '<insert three>'; in some cases it cannot be determined and
so is shown as '????'.
ACTION:
Tell the systems administrator.
您可以更进一步并指定 JSON:
中的插入
JSON 日志的示例部分:
"ibm_messageId":"AMQ9209E","ibm_arithInsert1":0,"ibm_arithInsert2":0,"ibm_commentInsert1":"localhost (127.0.0.1)","ibm_commentInsert2":"TCP/IP","ibm_commentInsert3":"SYSTEM.DEF.SVRCONN"
在下面的命令中,每个 ibm_arthInsert
都指定了一个 -n
标志,然后每个 ibm_commentInsert
都带有一个 -c
标志:
mqrc AMQ9209E -n 0 -n 0 -c "localhost (127.0.0.1)" -c "TCP/IP" -c "SYSTEM.DEF.SVRCONN"
输出如下:
536908297 0x20009209 rrcE_CONNECTION_CLOSED
536908297 0x20009209 urcMS_CONN_CLOSED
MESSAGE:
Connection to host 'localhost (127.0.0.1)' for channel 'SYSTEM.DEF.SVRCONN'
closed.
EXPLANATION:
An error occurred receiving data from 'localhost (127.0.0.1)' over TCP/IP. The
connection to the remote host has unexpectedly terminated.
The channel name is 'SYSTEM.DEF.SVRCONN'; in some cases it cannot be determined
and so is shown as '????'.
ACTION:
Tell the systems administrator.
我正在使用 splunk 转发器将我的 IBM MQ v9.1 错误日志转发到集中式集群,以查看我的分布式消息传递系统中发生的常见错误的趋势。
但是我无法解析所需的字段,因为 MQ 错误日志的格式各不相同,即消息的严重性可能是错误、警告、信息、严重和终止,并且每个消息在中都有不同的字段集本身并不一致。
如果有人在 splunk 中使用正则表达式来解析 v9.1 的 IBM MQ 错误日志字段,请告诉我。
我尝试了几种正则表达式模式,但没有按预期进行解析。
我已经在下面提到 link,但那是针对 v8 的,而 v9 的错误日志格式有所不同, https://t-rob.net/2017/12/18/parsing-mq-error-logs-in-splunk/
此外,splunk 用户无法访问错误日志。我在 qm.ini 中更新了以下节 文件系统: 验证验证=否
还将 chmod -R 755 设置为 /var/mqm/qmgrs/qmName/errors 文件夹。
虽然 ERROR 日志的权限在更新时不会改变,但当日志轮换时,权限将被撤销,splunk 用户将无法读取日志。
请告诉我如何在不将 splunk 用户添加到 mqm 组的情况下解决这个问题
我建议启用 JSON 日志记录并将这些日志转发到应该能够解析此格式的 Splunk。
在 IBM MQ v9.0.4 CDS 版本中,IBM 添加了注销到 JSON 格式日志的功能,即使您启用 JSON 记录。这包含在所有 MQ 9.1 LTS 和 CSD 版本中。
IBM MQ v9.1 知识中心页面 IBM MQ>Configuring>Changing IBM MQ and queue manager configuration information>Attributes for changing queue manager configuration information>Diagnostic message logging>Diagnostic message service stanzas>Diagnostic message services 包含有关该主题的信息。您可以将以下内容添加到您的 qm.ini
以使其将日志信息输出到标准队列管理器 errors
目录中名为 AMQERR0x.json
的 JSON 格式文件:
DiagnosticMessages: Service = File Name = JSONLogs Format = json FilePrefix = AMQERR
如 OP 所述,JSON 格式的日志不包含您在普通日志中看到的 EXPLANATION
或 ACTION
部分。
在 IBM MQ v9.1 中,您可以使用 mqrc
命令将 JSON 格式转换为您在 AMQERR01.LOG
.
下面是一个简单的例子:
cat <<EOL |mqrc -i json -o text -
{"ibm_messageId":"AMQ9209E","ibm_arithInsert1":0,"ibm_arithInsert2":0,"ibm_commentInsert1":"localhost (127.0.0.1)","ibm_commentInsert2":"TCP/IP","ibm_commentInsert3":"SYSTEM.DEF.SVRCONN","ibm_datetime":"2018-02-22T06:54:53.942Z","ibm_serverName":"QM1","type":"mq_log","host":"0df0ce19c711","loglevel":"ERROR","module":"amqccita.c:4214","ibm_sequence":"1519282493_947814358","ibm_remoteHost":"127.0.0.1","ibm_qmgrId":"QM1_2018-02-13_10.49.57","ibm_processId":4927,"ibm_threadId":4,"ibm_version":"9.1.0.5","ibm_processName":"amqrmppa","ibm_userName":"johndoe","ibm_installationName":"Installation1","ibm_installationDir":"/opt/mqm","message":"AMQ9209E: Connection to host 'localhost (127.0.0.1)' for channel 'SYSTEM.DEF.SVRCONN' closed."}
EOL
输出将是:
02/22/2018 06:54:53 AM - User(johndoe) Program(amqrmppa)
Host(0df0ce19c711) Installation(Installation1)
VRMF(9.1.0.5) QMgr(QM1)
Time(2018-02-22T11:54:53.942Z)
RemoteHost(127.0.0.1)
CommentInsert1(localhost (127.0.0.1))
CommentInsert2(TCP/IP)
CommentInsert3(SYSTEM.DEF.SVRCONN)
AMQ9209E: Connection to host 'localhost (127.0.0.1)' for channel
'SYSTEM.DEF.SVRCONN' closed.
EXPLANATION:
An error occurred receiving data from 'localhost (127.0.0.1)' over TCP/IP. The
connection to the remote host has unexpectedly terminated.
The channel name is 'SYSTEM.DEF.SVRCONN'; in some cases it cannot be determined
and so is shown as '????'.
ACTION:
Tell the systems administrator.
----- amqccita.c : 4214 -------------------------------------------------------
您也可以使用 mqrc
和来自 JSON 的错误消息,例如 AMQ9209E
,您可以 运行 这样的命令:
mqrc AMQ9209E
输出将是:
536908297 0x20009209 rrcE_CONNECTION_CLOSED
536908297 0x20009209 urcMS_CONN_CLOSED
MESSAGE:
Connection to host '<insert one>' for channel '<insert three>' closed.
EXPLANATION:
An error occurred receiving data from '<insert one>' over <insert two>. The
connection to the remote host has unexpectedly terminated.
The channel name is '<insert three>'; in some cases it cannot be determined and
so is shown as '????'.
ACTION:
Tell the systems administrator.
您可以更进一步并指定 JSON:
中的插入JSON 日志的示例部分:
"ibm_messageId":"AMQ9209E","ibm_arithInsert1":0,"ibm_arithInsert2":0,"ibm_commentInsert1":"localhost (127.0.0.1)","ibm_commentInsert2":"TCP/IP","ibm_commentInsert3":"SYSTEM.DEF.SVRCONN"
在下面的命令中,每个 ibm_arthInsert
都指定了一个 -n
标志,然后每个 ibm_commentInsert
都带有一个 -c
标志:
mqrc AMQ9209E -n 0 -n 0 -c "localhost (127.0.0.1)" -c "TCP/IP" -c "SYSTEM.DEF.SVRCONN"
输出如下:
536908297 0x20009209 rrcE_CONNECTION_CLOSED
536908297 0x20009209 urcMS_CONN_CLOSED
MESSAGE:
Connection to host 'localhost (127.0.0.1)' for channel 'SYSTEM.DEF.SVRCONN'
closed.
EXPLANATION:
An error occurred receiving data from 'localhost (127.0.0.1)' over TCP/IP. The
connection to the remote host has unexpectedly terminated.
The channel name is 'SYSTEM.DEF.SVRCONN'; in some cases it cannot be determined
and so is shown as '????'.
ACTION:
Tell the systems administrator.