来自 Apache Beam 的 BigQuery 授权视图
BigQuery Authorized Views from Apache Beam
我正在尝试使用 Apache Beam 在 BigQuery 中查询视图。
视图可以访问它引用的所有数据集。 Dataflow/GCE 服务帐户可以访问视图,但 不能访问其基础数据集(这应该不是问题)。
当我尝试 运行 查询授权视图的作业时,出现如下错误:
java.lang.RuntimeException: java.io.IOException: Unable to get table: test_13249, aborting after 9 retries.
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.executeWithRetries(BigQueryServicesImpl.java:1004)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:491)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:477)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:471)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQueryHelper.executeQuery(BigQueryQueryHelper.java:109)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySourceDef.getTableReference(BigQueryQuerySourceDef.java:113)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.getTableToExtract(BigQueryQuerySource.java:65)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:110)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:148)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:290)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:212)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:196)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:175)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:78)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:417)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:386)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:311)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "Access Denied: Table my-gcp-project:bigquery_dataset_any.test_13249: User does not have bigquery.tables.get permission for table my-gcp-project:bigquery_dataset_any.test_13249.",
"reason" : "accessDenied"
} ],
"message" : "Access Denied: Table my-gcp-project:bigquery_dataset_any.test_13249: User does not have bigquery.tables.get permission for table my-gcp-project:bigquery_dataset_any.test_13249.",
"status" : "PERMISSION_DENIED"
}
Beam 在处理 BigQuery 时试图变得聪明。发生此错误是因为 Beam 检查查询引用的表,而授权视图并不总是可能发生这种情况。
要解决此问题,您可以使用 BigQueryIO.readTableRows()
或 BigQueryIO.read(SerializableFunction)
中的方法 withQueryLocation
。这将允许 Beam 使用提供的查询位置,而不是推断位置。
因此:
BigQueryIO.readTableRows()
.fromQuery("SELECT * FROM my_authorized_view")
.withQueryLocation("US") // Whatever location is convenient for you
...
这应该可以解决您的问题。
我正在尝试使用 Apache Beam 在 BigQuery 中查询视图。
视图可以访问它引用的所有数据集。 Dataflow/GCE 服务帐户可以访问视图,但 不能访问其基础数据集(这应该不是问题)。
当我尝试 运行 查询授权视图的作业时,出现如下错误:
java.lang.RuntimeException: java.io.IOException: Unable to get table: test_13249, aborting after 9 retries.
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.executeWithRetries(BigQueryServicesImpl.java:1004)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:491)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:477)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl.getTable(BigQueryServicesImpl.java:471)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQueryHelper.executeQuery(BigQueryQueryHelper.java:109)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySourceDef.getTableReference(BigQueryQuerySourceDef.java:113)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryQuerySource.getTableToExtract(BigQueryQuerySource.java:65)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:110)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:148)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:290)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:212)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:196)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:175)
at org.apache.beam.runners.dataflow.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:78)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:417)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:386)
at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:311)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "Access Denied: Table my-gcp-project:bigquery_dataset_any.test_13249: User does not have bigquery.tables.get permission for table my-gcp-project:bigquery_dataset_any.test_13249.",
"reason" : "accessDenied"
} ],
"message" : "Access Denied: Table my-gcp-project:bigquery_dataset_any.test_13249: User does not have bigquery.tables.get permission for table my-gcp-project:bigquery_dataset_any.test_13249.",
"status" : "PERMISSION_DENIED"
}
Beam 在处理 BigQuery 时试图变得聪明。发生此错误是因为 Beam 检查查询引用的表,而授权视图并不总是可能发生这种情况。
要解决此问题,您可以使用 BigQueryIO.readTableRows()
或 BigQueryIO.read(SerializableFunction)
中的方法 withQueryLocation
。这将允许 Beam 使用提供的查询位置,而不是推断位置。
因此:
BigQueryIO.readTableRows()
.fromQuery("SELECT * FROM my_authorized_view")
.withQueryLocation("US") // Whatever location is convenient for you
...
这应该可以解决您的问题。