BigQuery Execute 失败,在 Cloud Data Fusion 上没有有意义的错误

BigQuery Execute fails with no meaningful error on Cloud Data Fusion

我正在尝试使用 Cloud Data Fusion 中的 BigQuery Execute 函数 (Google)。该组件验证正常,SQL 检查出来但每次执行时我都会收到这个无意义的错误:

02/11/2022 12:51:25 ERROR Pipeline 'test-bq-execute' failed.
02/11/2022 12:51:25 ERROR Workflow service 'workflow.default.test-bq-execute.DataPipelineWorkflow.<guid>' failed.
02/11/2022 12:51:25 ERROR Program DataPipelineWorkflow execution failed.

我看不到任何可以帮助我调试的东西。有任何想法吗?有问题的 SQL 是一个简单的 DELETE from dataset.table WHERE ds = CURRENT_DATE()

这是管道

{
    "name": "test-bq-execute",
    "description": "Data Pipeline Application",
    "artifact": {
        "name": "cdap-data-pipeline",
        "version": "6.5.1",
        "scope": "SYSTEM"
    },
    "config": {
        "resources": {
            "memoryMB": 2048,
            "virtualCores": 1
        },
        "driverResources": {
            "memoryMB": 2048,
            "virtualCores": 1
        },
        "connections": [],
        "comments": [],
        "postActions": [],
        "properties": {},
        "processTimingEnabled": true,
        "stageLoggingEnabled": false,
        "stages": [
            {
                "name": "BigQuery Execute",
                "plugin": {
                    "name": "BigQueryExecute",
                    "type": "action",
                    "label": "BigQuery Execute",
                    "artifact": {
                        "name": "google-cloud",
                        "version": "0.18.1",
                        "scope": "SYSTEM"
                    },
                    "properties": {
                        "project": "auto-detect",
                        "sql": "DELETE FROM GCPQuickStart.account WHERE ds = CURRENT_DATE()",
                        "dialect": "standard",
                        "mode": "batch",
                        "dataset": "GCPQuickStart",
                        "table": "account",
                        "useCache": "false",
                        "location": "US",
                        "rowAsArguments": "false",
                        "serviceAccountType": "filePath",
                        "serviceFilePath": "auto-detect"
                    }
                },
                "outputSchema": [
                    {
                        "name": "etlSchemaBody",
                        "schema": ""
                    }
                ],
                "id": "BigQuery-Execute",
                "type": "action",
                "label": "BigQuery Execute",
                "icon": "fa-plug"
            }
        ],
        "schedule": "0 1 */1 * *",
        "engine": "spark",
        "numOfRecordsPreview": 100,
        "maxConcurrentRuns": 1
    }
}

我能够使用 Cloud Logging 捕获错误。要在 Cloud Data Fusion 中启用 Cloud Logging,您可以使用此 GCP Documentation. And follow these steps 查看从 Data Fusion 到 Cloud Logging 的日志。复制您的方案,这是我发现的错误:

      "logMessage": "Program DataPipelineWorkflow execution failed.\njava.util.concurrent.ExecutionException: com.google.cloud.bigquery.BigQueryException: Cannot set destination table in jobs with DML statements\n    at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)\n    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)\n    at io.cdap.cdap.internal.app.runtime.distributed.AbstractProgramTwillRunnable.run(AbstractProgramTwillRunnable.java:274)\n    at org.apache.twill.interna..."
    }

我们为解决此错误所做的工作:无法在具有 DML 语句的作业中设置目标 table 是我们离开了 Dataset NameTable Name 在管道属性中为空,因为不需要指定目标 table。

输出: