数据流:无法在不同位置读取和写入:来源:asia-south1,目的地:us-central1”
Dataflow : Cannot read and write in different locations: source: asia-south1, destination: us-central1"
我在 运行 数据流时收到以下错误。我的数据源在 GCP BQ(asia-south1) 中,目标是 PostgreSQL DB(AWS -> Mumbai Region)。
java.io.IOException: Extract job beam_job_0c64359f7e274ff1ba4072732d7d9653_firstcrybqpgnageshpinjarkar07200750105c51e26c-extract failed, status: {
"errorResult" : {
"message" : "Cannot read and write in different locations: source: asia-south1, destination: us-central1",
"reason" : "invalid"
},
"errors" : [ {
"message" : "Cannot read and write in different locations: source: asia-south1, destination: us-central1",
"reason" : "invalid"
} ],
"state" : "DONE"
}.
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.executeExtract(BigQuerySourceBase.java:185)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:121)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:139)
at com.google.cloud.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:275)
at com.google.cloud.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:197)
at com.google.cloud.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:181)
at com.google.cloud.dataflow.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:160)
at com.google.cloud.dataflow.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:77)
at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我的代码如下:
p
.apply(BigQueryIO.read().from("datalake:Yearly2020.Sales"))
.apply(JdbcIO.<TableRow>write()
.withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create("org.postgresql.Driver", "jdbc:postgresql://xx.xx.xx.xx:1111/dbname")
.withUsername("username")
.withPassword("password"))
.withStatement("INSERT INTO Table VALUES(ProductRevenue)")
.withPreparedStatementSetter(new BQPGStatementSetter()));
p.run().waitUntilFinish();
我是运行管道如下:
gcloud beta dataflow jobs run sales_data \
--gcs-location gs://datalake-templates/Template \
--region=asia-east1 \
--network=datalake-vpc \
--subnetwork=regions/asia-east1/subnetworks/asia-east1 \
当 Bigquery 是源时,它运行加载作业,在 gcs 存储桶中暂存数据。数据在 temp_location
中暂存,如果未指定 temp_location
则它使用在 staging_location
.
中指定的区域
在数据流作业中,您可以使用在 asia-south
中创建的存储桶指定 temp_location
,因为那是您的 Bigquery 数据集所在的位置。
此外,如果您正在使用网络和子网,也建议关闭 public ip,以便通过 VPN 完成连接。
我在 运行 数据流时收到以下错误。我的数据源在 GCP BQ(asia-south1) 中,目标是 PostgreSQL DB(AWS -> Mumbai Region)。
java.io.IOException: Extract job beam_job_0c64359f7e274ff1ba4072732d7d9653_firstcrybqpgnageshpinjarkar07200750105c51e26c-extract failed, status: {
"errorResult" : {
"message" : "Cannot read and write in different locations: source: asia-south1, destination: us-central1",
"reason" : "invalid"
},
"errors" : [ {
"message" : "Cannot read and write in different locations: source: asia-south1, destination: us-central1",
"reason" : "invalid"
} ],
"state" : "DONE"
}.
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.executeExtract(BigQuerySourceBase.java:185)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.extractFiles(BigQuerySourceBase.java:121)
at org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.split(BigQuerySourceBase.java:139)
at com.google.cloud.dataflow.worker.WorkerCustomSources.splitAndValidate(WorkerCustomSources.java:275)
at com.google.cloud.dataflow.worker.WorkerCustomSources.performSplitTyped(WorkerCustomSources.java:197)
at com.google.cloud.dataflow.worker.WorkerCustomSources.performSplitWithApiLimit(WorkerCustomSources.java:181)
at com.google.cloud.dataflow.worker.WorkerCustomSources.performSplit(WorkerCustomSources.java:160)
at com.google.cloud.dataflow.worker.WorkerCustomSourceOperationExecutor.execute(WorkerCustomSourceOperationExecutor.java:77)
at com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
at com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
at com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
at com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我的代码如下:
p
.apply(BigQueryIO.read().from("datalake:Yearly2020.Sales"))
.apply(JdbcIO.<TableRow>write()
.withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create("org.postgresql.Driver", "jdbc:postgresql://xx.xx.xx.xx:1111/dbname")
.withUsername("username")
.withPassword("password"))
.withStatement("INSERT INTO Table VALUES(ProductRevenue)")
.withPreparedStatementSetter(new BQPGStatementSetter()));
p.run().waitUntilFinish();
我是运行管道如下:
gcloud beta dataflow jobs run sales_data \
--gcs-location gs://datalake-templates/Template \
--region=asia-east1 \
--network=datalake-vpc \
--subnetwork=regions/asia-east1/subnetworks/asia-east1 \
当 Bigquery 是源时,它运行加载作业,在 gcs 存储桶中暂存数据。数据在 temp_location
中暂存,如果未指定 temp_location
则它使用在 staging_location
.
在数据流作业中,您可以使用在 asia-south
中创建的存储桶指定 temp_location
,因为那是您的 Bigquery 数据集所在的位置。
此外,如果您正在使用网络和子网,也建议关闭 public ip,以便通过 VPN 完成连接。