如何使用 ESQL 从 MQ 消息中删除非 ascii 字符

How to remove non-ascii char from MQ messages with ESQL

结论:

出于某种原因,流程不允许我通过更改 Input 的 Message Domain 属性 将传入消息转换为 BLOB Node 所以我在 Compute Node 之前添加了一个 Reset Content Descriptor 节点,代码来自已接受的答案。在解析 XML 并为消息创建 XMLNSC Child 的行上,我收到了 'CHARACTER:Invalid wire format received ' 错误,所以我删除了该行并在 Compute Node 之后添加了另一个 Reset Content Descriptor 节点。现在它解析 Unicode 字符并将其替换为空格。所以现在它不会崩溃。

这是添加的计算节点的代码:

CREATE FUNCTION Main() RETURNS BOOLEAN
BEGIN
    DECLARE NonPrintable BLOB X'0001020304050607080B0C0E0F101112131415161718191A1B1C1D1E1F7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF1F2F3F4F5F6F7F8F9FAFBFCFDFEFF';
    DECLARE Printable    BLOB X'20202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020';
    DECLARE Fixed        BLOB TRANSLATE(InputRoot.BLOB.BLOB, NonPrintable, Printable);
    SET OutputRoot           = InputRoot;
    SET OutputRoot.BLOB.BLOB = Fixed;
    RETURN TRUE;
END;

更新:

正在使用 XMLNSC 将邮件解析为 XML。以为这会导致问题,但似乎并没有。

现在我正在使用 PHP。我创建了一个节点来插入遗留流程。这是相关代码:

class fixIncompetence {
function evaluate ($output_assembly,$input_assembly) {
    $output_assembly->MRM = $input_assembly->MRM;
    $output_assembly->MQMD = $input_assembly->MQMD;
    $tmp =  htmlentities($input_assembly->MRM->VALUE_TO_FIX, ENT_HTML5|ENT_SUBSTITUTE,'UTF-8');
    if (!empty($tmp)) {
        $output_assembly->MRM->VALUE_TO_FIX = $tmp;
    }
    // Ensure there are no null MRM fields. MessageBroker is strict.
    foreach ($output_assembly->MRM as $key => $val) {
        if (empty($val)) {
            $output_assembly->MRM->$key = '';
        }
    }
}

}

现在我收到关于只读消息的模糊错误,但在此之前它也不起作用。

原问题:

For some reason I am unable to impress upon the senders of our MQ messages that smart quotes, endashes, emdashes, and such crash our XML parser.

I managed to make a working solution with SQL queries, but it wasted too many resources. Here's the last thing I tried, but it didn't work either:

  CREATE FUNCTION CLEAN(IN STR CHAR) RETURNS CHAR BEGIN
    SET STR = REPLACE('–',STR,'–');
    SET STR = REPLACE('—',STR,'—');
    SET STR = REPLACE('·',STR,'·');
    SET STR = REPLACE('“',STR,'“');
    SET STR = REPLACE('”',STR,'”');
    SET STR = REPLACE('‘',STR,'&lsqo;');
    SET STR = REPLACE('’',STR,'’');
    SET STR = REPLACE('•',STR,'•');
    SET STR = REPLACE('°',STR,'°');
    RETURN STR;
END;

As you can see I'm not very good at this. I have tried reading about various ESQL string functions without much success.

因此在 ESQL 中您可以使用 TRANSLATE 函数。

以下是我用来清理包含非 ASCII 低十六进制值的 BLOB 的代码片段,以便将其转换为可用的字符串。

您应该能够修改它,将您不想要的字符更改为更温和的字符。基本上,NonPrintable 中的每个十六进制值都被翻译成它在 Printable 中的位置等价物,在这种情况下总是一个句号,即 ASCII 中的 x'2E'。您需要使 BLOB 足够长以覆盖所需的十六进制值范围。

DECLARE NonPrintable BLOB X'000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F';
DECLARE Printable    BLOB X'2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E';
SET WorkBlob = TRANSLATE(WorkBlob, NonPrintable, Printable);

顺便说一句,如果带有无效字符的消息时不时地出现,那么我可能会在输入节点上指定 BLOB,然后使用类似于以下内容的内容来调用 XMLNSC 解析器。

CREATE LASTCHILD OF OutputRoot DOMAIN 'XMLNSC'
       PARSE(InputRoot.BLOB.BLOB CCSID InputRoot.Properties.CodedCharSetId ENCODING InputRoot.Properties.Encoding);

连接异常终端后,您可以在尝试重新解析之前更正任何包含解析器破坏无效字符的消息的 BLOB。

最后,我最良好的祝愿,因为多年来我经历过多次被迫更正 "Integration Layer" 中无效消息内容的斗争,毕竟这就是它的本意。