talend : csv 列中间的换行符

talend : newline character in middle of csv column

I am fetching data using tSoap component in which i am getting result in XML format as comma separated values. In which columns are separated by comma and rows are separated by '\n'.

After that i am using tExtractXMLField component for extracting data from the response.

But in data i have '\n' within the strings which is treating it as a new row. I tried using tReplace component to remove \n within the quotes using regex but data is too large, result causing WhosebugError.

Also I tried using tNomalize component to separate the rows using CSV option, but the problem still persist.

Can you please help me on this. Thanks in advance.

Response which i am getting from the soap request is:

  <env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
<env:Header/>
<env:Body>
<ns2:getReportResultCsvResponse xmlns:ns2="http://service.admin.ws.five9.com/">
<return>TIMESTAMP,CALL ID,NOTES
"Mon, 17 Apr 2017 10:05:38",4223519,
"Mon, 17 Apr 2017 10:05:40",4223520,
"Mon, 17 Apr 2017 10:05:41",4223521,"Alexandria..
Monday -- 55 partial
Bal -- 224 May 1
Visa"
"Mon, 17 Apr 2017 10:05:42",4223522,
"Mon, 17 Apr 2017 10:05:43",4223523,
"Mon, 17 Apr 2017 10:11:04",4223524,
"Mon, 17 Apr 2017 10:05:43",4223524,
"Mon, 17 Apr 2017 10:05:45",4223525,</return>
</ns2:getReportResultCsvResponse>
</env:Body>
</env:Envelope>

Here as we can see "notes" column having data which have '\n' in it in between the quotes, and it is causing issue for extracting data. Can you please tell me how can i resolve this issue.

事实上,您的文件是嵌入到 XML 文件中的 CSV 文件。
因为 "notes" 字段被包含在 " 之间,解决方案是将文件转换为纯 CSV,然后,由于适当的 "CSV option","\n" 的问题自动消失。

工作内容如下:

tFileInputFullRow 读取输入文件,因为它默认位于名为 "line" 的单个字段中。只需将 Header 设置为 4,将 Footer 设置为 3 即可忽略大部分 XML 部分(假设文件结构始终相同)。

将结果传递给tMap只是为了删除剩余的XML "return"标记没有被上一步删除(因为不在单独的一行)。
这是带有用于删除此标记的 replaceAll 的 tMap:

在 tMap 之后,使用 tFileOutputDelimited 将流传递到纯 CSV 文件。让所有选项都具有建议的默认值。

现在,使用 tFileInputDelimited 启动第二个子作业以读取 CSV 文件。 使用 3 列 "Timestamp"、"CallId" 和 "Notes" 定义模式。将字段分隔符设置为“,”和魔法,点击"CSV options",没有别的。

为了只显示"notes"字段中带有“\n”的记录,我将Header设置为3,Limit设置为1(tFileInputDelimited后只有1行的原因)。
结果如下:

如您所见,由于“\n”字符,字段 "notes" 按预期分派到 4 行。

此致,
成绩单