Talend : XML with multiple header => 用 Talend 的作业拆分文件

Question

我收到的文件有多个问题 header =>

[XML 图片][1]

我想用 talend 创建一个作业，将其拆分为多个文件或创建一个可读文件。

我试了很多方法都没有成功。

我收到的文件是输出文件 (*.out)，而不是开头的 XML。

感谢您的帮助！ :)

编辑：

感谢回复：

例如：初始文件（*.out 文件）=>

<?xml version="1.0" encoding="UTF-8"?><Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02"><BkToCstmrDbtCdtNtfctn><GrpHdr><MsgId>AI3069868076</MsgId><CreDtTm>2017-04-03T23:51:23.586</CreDtTm><MsgPgntn><PgNb>1</PgNb><LastPgInd>true</LastPgInd></MsgPgntn></GrpHdr></BkToCstmrDbtCdtNtfctn></Document>
<?xml version="1.0" encoding="UTF-8"?><Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02"><BkToCstmrDbtCdtNtfctn><GrpHdr><MsgId>AI4069973130</MsgId><CreDtTm>2017-04-04T21:09:41.090</CreDtTm><MsgPgntn><PgNb>1</PgNb><LastPgInd>true</LastPgInd></MsgPgntn></GrpHdr></BkToCstmrDbtCdtNtfctn></Document>
<?xml version="1.0" encoding="UTF-8"?><Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02"><BkToCstmrDbtCdtNtfctn><GrpHdr><MsgId>AI4069973134</MsgId><CreDtTm>2017-04-04T21:09:41.090</CreDtTm><MsgPgntn><PgNb>1</PgNb><LastPgInd>true</LastPgInd></MsgPgntn></GrpHdr></BkToCstmrDbtCdtNtfctn></Document>

我会

文件 1 :

<?xml version="1.0" encoding="UTF-8"?><Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02"><BkToCstmrDbtCdtNtfctn><GrpHdr><MsgId>AI3069868076</MsgId><CreDtTm>2017-04-03T23:51:23.586</CreDtTm><MsgPgntn><PgNb>1</PgNb><LastPgInd>true</LastPgInd></MsgPgntn></GrpHdr></BkToCstmrDbtCdtNtfctn></Document>

文件 2 :

<?xml version="1.0" encoding="UTF-8"?><Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02"><BkToCstmrDbtCdtNtfctn><GrpHdr><MsgId>AI4069973130</MsgId><CreDtTm>2017-04-04T21:09:41.090</CreDtTm><MsgPgntn><PgNb>1</PgNb><LastPgInd>true</LastPgInd></MsgPgntn></GrpHdr></BkToCstmrDbtCdtNtfctn></Document>

文件 3 :

<?xml version="1.0" encoding="UTF-8"?><Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.054.001.02"><BkToCstmrDbtCdtNtfctn><GrpHdr><MsgId>AI4069973134</MsgId><CreDtTm>2017-04-04T21:09:41.090</CreDtTm><MsgPgntn><PgNb>1</PgNb><LastPgInd>true</LastPgInd></MsgPgntn></GrpHdr></BkToCstmrDbtCdtNtfctn></Document>

'Cause the inital file is unreadable ! :'(

Answer 1

如果题目只是为输入文件的每条记录生成一个单独的文件，你可以这样进行：

tSetGlobalVar 初始化名为 "fileCounter" 的变量值为 1（将用于输出文件命名）
tFileList 遍历要转换的文件（即使只有 1 个）
tFileInputFullRow 一次读取输入文件 1 行
tFlowToIterate 迭代当前文件的每个输入行
tFixedFlowInput 将输入流的每个字段转换为全局变量（这里称为 "line" 因为 tFileInputFullRow）并重新启动流（生成的全局变量不会在本例中使用）

tJavaRow 增加 "fileCounter" 值并将当前行传输到输出文件

output_row.line = input_row.content;
globalMap.put("fileCounter", ((Integer)globalMap.get("fileCounter"))+1);</pre>

tFileOutputRaw 为当前行生成输出文件 - 只需根据当前行等级设置文件名：

((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")).replace(".xml", "") + (Integer)globalMap.get("fileCounter") + ".xml"</pre>

在此示例中，源文件名应该以“.xml”结尾，但您可以轻松地自行安排。

结果如下（1 个输入文件有 3 个记录 = 3 个输出文件）：

Answer 2

检查：

tFileInputFullRow 架构仅包含 1 个名为 "line" 的字段（这是默认值）
tFixedFlowInput 架构仅包含 1 个名为 "content" 的字段，其中填充了“((String)globalMap.get("row1.line"))”

同时检查 tJavaRow 架构是否如下所示：

您可以根据需要更改字段名称，但它们必须与作业一致。

Talend : XML with multiple header => 用 Talend 的作业拆分文件

Talend : XML with multiple header => split the file with Talend's job

xml

talend