BigQuery Execute 失败,在 Cloud Data Fusion 上没有有意义的错误
BigQuery Execute fails with no meaningful error on Cloud Data Fusion
我正在尝试使用 Cloud Data Fusion 中的 BigQuery Execute 函数 (Google)。该组件验证正常,SQL 检查出来但每次执行时我都会收到这个无意义的错误:
02/11/2022 12:51:25 ERROR Pipeline 'test-bq-execute' failed.
02/11/2022 12:51:25 ERROR Workflow service 'workflow.default.test-bq-execute.DataPipelineWorkflow.<guid>' failed.
02/11/2022 12:51:25 ERROR Program DataPipelineWorkflow execution failed.
我看不到任何可以帮助我调试的东西。有任何想法吗?有问题的 SQL 是一个简单的 DELETE from dataset.table WHERE ds = CURRENT_DATE()
这是管道
{
"name": "test-bq-execute",
"description": "Data Pipeline Application",
"artifact": {
"name": "cdap-data-pipeline",
"version": "6.5.1",
"scope": "SYSTEM"
},
"config": {
"resources": {
"memoryMB": 2048,
"virtualCores": 1
},
"driverResources": {
"memoryMB": 2048,
"virtualCores": 1
},
"connections": [],
"comments": [],
"postActions": [],
"properties": {},
"processTimingEnabled": true,
"stageLoggingEnabled": false,
"stages": [
{
"name": "BigQuery Execute",
"plugin": {
"name": "BigQueryExecute",
"type": "action",
"label": "BigQuery Execute",
"artifact": {
"name": "google-cloud",
"version": "0.18.1",
"scope": "SYSTEM"
},
"properties": {
"project": "auto-detect",
"sql": "DELETE FROM GCPQuickStart.account WHERE ds = CURRENT_DATE()",
"dialect": "standard",
"mode": "batch",
"dataset": "GCPQuickStart",
"table": "account",
"useCache": "false",
"location": "US",
"rowAsArguments": "false",
"serviceAccountType": "filePath",
"serviceFilePath": "auto-detect"
}
},
"outputSchema": [
{
"name": "etlSchemaBody",
"schema": ""
}
],
"id": "BigQuery-Execute",
"type": "action",
"label": "BigQuery Execute",
"icon": "fa-plug"
}
],
"schedule": "0 1 */1 * *",
"engine": "spark",
"numOfRecordsPreview": 100,
"maxConcurrentRuns": 1
}
}
我能够使用 Cloud Logging 捕获错误。要在 Cloud Data Fusion 中启用 Cloud Logging,您可以使用此 GCP Documentation. And follow these steps 查看从 Data Fusion 到 Cloud Logging 的日志。复制您的方案,这是我发现的错误:
"logMessage": "Program DataPipelineWorkflow execution failed.\njava.util.concurrent.ExecutionException: com.google.cloud.bigquery.BigQueryException: Cannot set destination table in jobs with DML statements\n at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)\n at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)\n at io.cdap.cdap.internal.app.runtime.distributed.AbstractProgramTwillRunnable.run(AbstractProgramTwillRunnable.java:274)\n at org.apache.twill.interna..."
}
我们为解决此错误所做的工作:无法在具有 DML 语句的作业中设置目标 table 是我们离开了 Dataset Name
和 Table Name
在管道属性中为空,因为不需要指定目标 table。
输出:
我正在尝试使用 Cloud Data Fusion 中的 BigQuery Execute 函数 (Google)。该组件验证正常,SQL 检查出来但每次执行时我都会收到这个无意义的错误:
02/11/2022 12:51:25 ERROR Pipeline 'test-bq-execute' failed.
02/11/2022 12:51:25 ERROR Workflow service 'workflow.default.test-bq-execute.DataPipelineWorkflow.<guid>' failed.
02/11/2022 12:51:25 ERROR Program DataPipelineWorkflow execution failed.
我看不到任何可以帮助我调试的东西。有任何想法吗?有问题的 SQL 是一个简单的 DELETE from dataset.table WHERE ds = CURRENT_DATE()
这是管道
{
"name": "test-bq-execute",
"description": "Data Pipeline Application",
"artifact": {
"name": "cdap-data-pipeline",
"version": "6.5.1",
"scope": "SYSTEM"
},
"config": {
"resources": {
"memoryMB": 2048,
"virtualCores": 1
},
"driverResources": {
"memoryMB": 2048,
"virtualCores": 1
},
"connections": [],
"comments": [],
"postActions": [],
"properties": {},
"processTimingEnabled": true,
"stageLoggingEnabled": false,
"stages": [
{
"name": "BigQuery Execute",
"plugin": {
"name": "BigQueryExecute",
"type": "action",
"label": "BigQuery Execute",
"artifact": {
"name": "google-cloud",
"version": "0.18.1",
"scope": "SYSTEM"
},
"properties": {
"project": "auto-detect",
"sql": "DELETE FROM GCPQuickStart.account WHERE ds = CURRENT_DATE()",
"dialect": "standard",
"mode": "batch",
"dataset": "GCPQuickStart",
"table": "account",
"useCache": "false",
"location": "US",
"rowAsArguments": "false",
"serviceAccountType": "filePath",
"serviceFilePath": "auto-detect"
}
},
"outputSchema": [
{
"name": "etlSchemaBody",
"schema": ""
}
],
"id": "BigQuery-Execute",
"type": "action",
"label": "BigQuery Execute",
"icon": "fa-plug"
}
],
"schedule": "0 1 */1 * *",
"engine": "spark",
"numOfRecordsPreview": 100,
"maxConcurrentRuns": 1
}
}
我能够使用 Cloud Logging 捕获错误。要在 Cloud Data Fusion 中启用 Cloud Logging,您可以使用此 GCP Documentation. And follow these steps 查看从 Data Fusion 到 Cloud Logging 的日志。复制您的方案,这是我发现的错误:
"logMessage": "Program DataPipelineWorkflow execution failed.\njava.util.concurrent.ExecutionException: com.google.cloud.bigquery.BigQueryException: Cannot set destination table in jobs with DML statements\n at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)\n at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)\n at io.cdap.cdap.internal.app.runtime.distributed.AbstractProgramTwillRunnable.run(AbstractProgramTwillRunnable.java:274)\n at org.apache.twill.interna..."
}
我们为解决此错误所做的工作:无法在具有 DML 语句的作业中设置目标 table 是我们离开了 Dataset Name
和 Table Name
在管道属性中为空,因为不需要指定目标 table。
输出: