Composer 2 / GKE Autopilot 集群 PodOperator 任务的工作负载身份和服务帐户

Workload Identity & Service Accounts for Composer 2 / GKE Autopilot Cluster PodOperator tasks

我正在尝试 运行 GKEStartPodOperator/KubernetesPodOperator Composer 2 环境中的任务,该环境在自动驾驶模式下使用 GKE 集群。我们有一个现有的 Composer 1 环境,其中的 GKE 集群未处于自动驾驶模式。我们使用 Google 云平台服务(BigQuery、GCS 等)进行身份验证的任务在 Composer 2 环境中以 401 未授权失败,但在 Composer 1 环境中成功。

在日志文件中,我可以看出两种环境中的任务都是通过向元数据服务器发出请求来获取凭据的。主要区别是 Composer 1 中的任务请求分配给任务 运行 所在节点的服务帐户,但 Composer 2 中的任务请求似乎是工作负载身份池,如 [project-name].svc.id.goog.

来自 Composer 1 的日志是:

[2021-10-22 12:38:01,349] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-22 12:38:01,351] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-22 12:38:01,352] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-22 12:38:01,359] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-22 12:38:01,374] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-22 12:38:01,392] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-22 12:38:01,393] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-22 12:38:01,393] {pod_launcher.py:148} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-22 12:38:01,395] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-22 12:38:01,398] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-22 12:38:01,412] {pod_launcher.py:148} INFO - DEBUG:google.cloud.bigquery.opentelemetry_tracing:This service is instrumented using OpenTelemetry. OpenTelemetry could not be imported; please add opentelemetry-api and opentelemetry-instrumentation packages in order to get BigQuery Tracing data.
[2021-10-22 12:38:01,414] {pod_launcher.py:148} INFO - DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
[2021-10-22 12:38:01,415] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
[2021-10-22 12:38:01,437] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): metadata.google.internal:80
[2021-10-22 12:38:01,452] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true HTTP/1.1" 200 226
[2021-10-22 12:38:01,454] {pod_launcher.py:148} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/[project-id]-compute@developer.gserviceaccount.com/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform
[2021-10-22 12:38:01,463] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/[project-id]-compute@developer.gserviceaccount.com/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform HTTP/1.1" 200 1049
[2021-10-22 12:38:01,468] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): bigquery.googleapis.com:443
[2021-10-22 12:38:02,028] {pod_launcher.py:148} INFO - DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "POST /bigquery/v2/projects/[project-nam]/jobs?prettyPrint=false HTTP/1.1" 200 None

来自 Composer 2 的日志是:

[2021-10-21 13:56:06,619] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-21 13:56:06,620] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-21 13:56:06,620] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-21 13:56:06,621] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-21 13:56:06,624] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-21 13:56:06,634] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth._default:Cloud SDK credentials not found on disk; not using them
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://[cluster-ip]
[2021-10-21 13:56:06,635] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport._http_client:Making request: GET http://metadata.google.internal/computeMetadata/v1/project/project-id
[2021-10-21 13:56:06,641] {pod_launcher.py:149} INFO - DEBUG:google.cloud.bigquery.opentelemetry_tracing:This service is instrumented using OpenTelemetry. OpenTelemetry could not be imported; please add opentelemetry-api and opentelemetry-instrumentation packages in order to get BigQuery Tracing data.
[2021-10-21 13:56:06,642] {pod_launcher.py:149} INFO - DEBUG:urllib3.util.retry:Converted retries value: 3 -> Retry(total=3, connect=None, read=None, redirect=None, status=None)
[2021-10-21 13:56:06,642] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
[2021-10-21 13:56:06,714] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): metadata.google.internal:80
[2021-10-21 13:56:06,720] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true HTTP/1.1" 200 121
[2021-10-21 13:56:06,721] {pod_launcher.py:149} INFO - DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/[project-name].svc.id.goog/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform
[2021-10-21 13:56:06,831] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:http://metadata.google.internal:80 "GET /computeMetadata/v1/instance/service-accounts/[project-name].svc.id.goog/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform HTTP/1.1" 200 765
[2021-10-21 13:56:06,833] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): bigquery.googleapis.com:443
[2021-10-21 13:56:06,866] {pod_launcher.py:149} INFO - DEBUG:urllib3.connectionpool:https://bigquery.googleapis.com:443 "POST /bigquery/v2/projects/[project-name]/jobs?prettyPrint=false HTTP/1.1" 401 None

基于 Workload Identity documentation,我想我需要将特定的服务帐户绑定到 node/node-pool 运行 吊舱,但我不确定该怎么做使用 Composer 2 GKE Autopilot,因为节点是为我管理的。 Composer 2 目前没有关于使用 KubernetesPodOperator 或 GKEStartPodOperator 的文档。

总而言之,我的问题是:我应该如何配置我的 Composer 2 环境 PodOperator 任务以利用特定服务帐户对 GCP 服务进行身份验证?

我从运维工程师那里得到了一些指导,现在有一个 KubernetesPodOperator 任务通过服务帐户成功通过 GCP 服务进行身份验证。我将在下面分享步骤和有用的信息。

首先,按照 Authenticating to Google Cloud using Workload Identity 的步骤进行操作。我以为 Composer 2 为我配置了 kubernetes <> google 云服务帐户绑定和注释,但事实并非如此。我必须按照说明创建名称空间、kubernetes 服务帐户、ksa 和 gsa 的绑定以及 KSA 的注释。

其次,我必须更新我的 KubernetesPodOperator 实例,将参数 namespaceservice_account_name 设置为我在第一步中创建的名称空间和 kubernetes 服务帐户。

稍后上传 DAG 和执行任务,我可以确认这两个步骤使我的任务能够请求绑定的 Google 服务帐户,并从那里请求 google 客户端库在我针对 BigQuery 的测试中身份验证成功。