将 hadoop 集群连接到多个 Google 项目中的多个 Google Cloud Storage 存储桶

Connect hadoop cluster to mutiple Google Cloud Storage backets in multiple Google Projects

可以一次将我的 Hadoop 集群连接到多个 Google 云项目吗?

我可以通过 Google 云存储连接器在单个 Google 项目中轻松使用任何 Google 存储桶,如本线程 Migrating 50TB data from local Hadoop cluster to Google Cloud Storage 中所述。但是我找不到任何文档或示例如何从单个 map-reduce 作业连接到两个或多个 Google Cloud Project。你有 suggestion/trick 吗?

非常感谢。

确实,可以同时将您的集群连接到来自多个不同项目的存储桶。最后,如果您使用 using a service-account keyfile, the GCS requests are performed on behalf of that service-account, which can be treated more-or-less like any other user. You can either add the service account email your-service-account-email@developer.gserviceaccount.com to all the different cloud projects owning buckets you want to process, using the permissions section of cloud.google.com/console and simply adding that email address like any other member, or you can set GCS-level access 的说明像任何其他用户一样添加该服务帐户。