将 batch-bigtable 设置为来自 Spark 流的数据主机的未经授权的错误

Unauthorized error setting batch-bigtable as data host from Spark streaming

我按照此处的示例从 Spark Streaming 写入 Cloud Bigtable:https://github.com/GoogleCloudPlatform/cloud-bigtable-examples/tree/master/scala/spark-streaming

在我的实例中,我从 Kafka 消费,进行一些转换,然后需要将它们写入我的 Bigtable 实例。最初,使用该示例中的所有依赖版本,当我尝试从连接后的 Bigtable 访问任何内容时,我从超时中收到 UNAUTHORIZED 错误:

Refreshing the OAuth token Retrying failed call. Failure #1, got: Status{code=UNAUTHENTICATED, description=Unexpected failure get auth token,
at java.util.concurrent.FutureTask.get(FutureTask.java:205) 
at com.google.bigtable.repackaged.com.google.cloud.bigtable.grpc.io.RefreshingOAuth2CredentialsInterceptor.getHeader(RefreshingOAuth2CredentialsInterceptor.java:290)

然后我将 bigtable-hbase-1.x-hadoop 依赖性提升到更新的版本,例如 1.9.0,并通过了 table 管理工作的身份验证,但在实际尝试时获得了额外的 UNAUTHORIZED拨打 saveAsNewAPIHadoopDataset() 电话:

Retrying failed call. Failure #1, got: Status{code=UNAUTHENTICATED, description=Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. 
See https://developers.google.com/identity/sign-in/web/devconsole-project., cause=null} on channel 34. 
Trailers: Metadata(www-authenticate=Bearer realm="https://accounts.google.com/",bigtable-channel-id=34)

我发现从 setBatchConfigOptions() 方法中删除 conf.set(BigtableOptionsFactory.BIGTABLE_HOST_KEY, BigtableOptions.BIGTABLE_BATCH_DATA_HOST_DEFAULT) 允许调用通过默认主机进行身份验证,并将处理多个 Kafka 消息,但随后会停止、挂断和最终抛出 No route to host 错误:

019-07-25 17:29:12 INFO JobScheduler:54 - Added jobs for time 1564093750000 ms 
2019-07-25 17:29:21 INFO JobScheduler:54 - Added jobs for time 1564093760000 ms 
2019-07-25 17:29:31 INFO JobScheduler:54 - Added jobs for time 1564093770000 ms 
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress. 
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress. 
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress. 
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress. 
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress. 
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress. 
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress. 
2019-07-25 17:29:36 WARN OperationAccountant:116 - No operations completed within the last 30 seconds. There are still 1 operations in progress. 
2019-07-25 17:29:38 WARN AbstractRetryingOperation:130 - Retrying failed call. 
Failure #1, got: Status{code=UNAVAILABLE, description=io exception, cause=com.google.bigtable.repackaged.io.grpc.netty.shaded.io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: No route to host: batch-bigtable.googleapis.com/2607:f8b0:400f:801:0:0:0:200a:443

我假设这是依赖版本的问题,因为该示例相当陈旧,但找不到任何更新的从 Spark Streaming 写入 Bigtable 的示例。我还没有找到适用于 bigtable-hbase-2.x-hadoop.


当前 POM:


批处理模式的身份验证问题是 Bigtable API 中的一个已知问题。他们最近发布了解决这些问题的 1.12.0。 NoRouteToHostException 被隔离到 运行 本地,最终成为公司防火墙问题,在设置 -Dhttps.proxyHost 和 -Dhttps.proxyPort.
