Talend S3 CSV 到 Redshift 处理缺失数据

Question

我在 S3 中加载了一个 CSV 平面文件，该文件偶尔会在以逗号分隔的列中包含一个空值，例如："ColumnValue1,,ColumnValue3,...etc" 例如，注意 ",," 作为 CSV 中的缺失值。下面是在 Talend 中使用 tDBBulkExec 组件从 S3 到 Redshift 设置的非常基本的移动：

按如下方式映射列，并且运行:

根据源文件中缺失的值抛出错误：

Exception in component tDBBulkExec_1 (tncretail_opportunity)
java.sql.SQLException: [Amazon](500310) Invalid operation: syntax error at or near "," 
Position: 100;
    at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.handleMessage(Unknown Source)
    at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(Unknown Source)
    at com.amazon.redshift.client.PGMessagingContext.getBindComplete(Unknown Source)
    at com.amazon.redshift.client.PGClient.handleErrorsScenario1(Unknown Source)
    at com.amazon.redshift.client.PGClient.handleErrors(Unknown Source)
    at com.amazon.redshift.client.PGClient.directExecuteExtraMetadata(Unknown Source)
    at com.amazon.redshift.dataengine.PGQueryExecutor.execute(Unknown Source)
    at com.amazon.jdbc.common.SStatement.executeNoParams(Unknown Source)
    at com.amazon.jdbc.common.SStatement.execute(Unknown Source)
Caused by: com.amazon.support.exceptions.ErrorException: [Amazon](500310) Invalid operation: syntax error at or near "," 
Position: 100;

我该如何修改它才能工作？

Answer 1

我使用 Talend 已经有一段时间了，它与我所做的和我推荐的模式有一些关键区别。

您的问题很可能是您没有指定数据库列名称，这些名称需要指定且唯一。（数据库使用这些来关联回模式）
除非您有使用 TdB 组件的特定原因，否则不要使用它们，而是使用预构建的特定连接器 IE tRedshiftBulkExec。
据我所知，Redshift 批量文件是带分隔符的 csv，但不一定符合某个文件的确切规范，因此您对该组件的使用可能是错误的。

我建议您解决的问题是

从 s3 中提取 CSV 转换为批量文件，然后上传到 Redshift（您可以使用 ts3list-->ts3get-'onComOK'->TfileInputDelimited-->TRedshiftOutputBulkExecute
或者您可以使用 TRedshiftRow 发出复制命令 https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

我还会仔细检查分隔文件是否损坏，并考虑使用 Text Enclosures（您的文本可能在某处包含逗号）

Talend S3 CSV 到 Redshift 处理缺失数据

Talend S3 CSV to Redshift Handling Missing Data

syntax-error

amazon-web-services

talend