如果找到特定字符串,则删除一行

Delete a single line if a specific string is found

我是 IIB 的新手,目前我想要实现的是从 TXT 中删除一行,如果它包含一个特定的单词,例如单词 USA 作为per below.I 将其作为 BLOB 读取,然后将其转换为字符串。我应该使用 Compute 节点还是 Java 节点来实现?提前致谢。

例如

之前

Hello my name 
is Malcom and I live
in the USA

之后

Hello my name 
is Malcom and I live

当前流量 文件输入 -> 计算 -> Java计算 -> 文件输出

FileInput : 从特定文件夹读取数据

计算:将一个字符串替换为另一个字符串(掩码)

CREATE PROCEDURE getBLOBMessage() BEGIN
        DECLARE fullBLOB CHARACTER;
        SET fullBLOB = CAST(OutputRoot.BLOB.BLOB as char CCSID 1208 Encoding 815);
        SET OutputLocalEnvironment.msg = fullBLOB;
    END;
    
    CREATE PROCEDURE maskMessage(INOUT msg CHARACTER) BEGIN
         SET msg = REPLACE (msg, '431.111.55.113', 'XXX.XXX.XX.XXX');
         SET msg = REPLACE (msg, '111.115.11.112', 'XXX.XXX.XX.XXX');
         SET msg = REPLACE (msg, '111.112.11.112', 'XXX.XXX.XX.XXX');
         SET msg = REPLACE (msg, '111.111.111.116', 'XXX.XXX.XXX.XXX');
         SET msg = REPLACE (msg, '172.16.18.72', 'XXX.XX.XX.XX');
         SET msg = REPLACE (msg, 'b1111111110', 'XXXXXXXXXXX');
         SET msg = REPLACE (msg, '11111111101', 'XXXXXXXXXXX');
         SET msg = REPLACE (msg, '11111111111', 'XXXXXXXXXXX');
         SET msg = REPLACE (msg, 'B1111111111', 'XXXXXXXXXXX');
         SET msg = REPLACE (msg, 'Q1111111', 'XXXXXXXX');
         SET msg = REPLACE (msg, '11111111111N', 'XXXXXXXXXXXX'); 
         SET OutputRoot.BLOB.BLOB = CAST (msg AS BLOB CCSID 1208 Encoding 815);
    END;

Java计算:也许是为了删除行?

FileOutput:生成输出txt文件

如果您使用文件输入节点的 Record detection 功能,您的要求可以在 ESQL 中实现。

文件输入节点:

  • 记录和元素:记录检测 = 分隔
  • End of Data 连接到 FileOutput 节点的 Finish File

计算节点:

CREATE COMPUTE MODULE Thaqif_Compute

    CREATE FUNCTION Main() RETURNS BOOLEAN
    BEGIN
        SET OutputRoot = InputRoot;
        DECLARE line CHARACTER CAST(OutputRoot.BLOB.BLOB AS CHAR
                                    CCSID InputProperties.CodedCharSetId
                                    ENCODING InputProperties.Encoding);
        IF CONTAINS(line, 'USA') THEN
            RETURN FALSE;
        ELSE
            CALL maskMessage(line);
            SET OutputRoot.BLOB.BLOB = CAST(line AS BLOB 
                                            CCSID InputProperties.CodedCharSetId
                                            ENCODING InputProperties.Encoding);
            RETURN TRUE;
        END IF;
    END;

    CREATE PROCEDURE maskMessage(INOUT msg CHARACTER) BEGIN
        SET msg = REPLACE (msg, '431.111.55.113', 'XXX.XXX.XX.XXX');
        -- Other patterns removed for brevity
        SET msg = REPLACE (msg, 'Q1111111', 'XXXXXXXX');
    END;

END MODULE;

文件输出节点:

  • 记录和元素:记录定义 = 记录是定界数据

示例输入:

Hello my name 
is Malcom and I live
in the USA
where 431.111.55.113 is masked
but Q2222222 is still ok

结果输出:

Hello my name 
is Malcom and I live
where XXX.XXX.XX.XXX is masked
but Q2222222 is still ok