airflow 2.1.3 使用 pgbouncer 解决 postgresql 问题

airflow 2.1.3 using pgbouncer for postgresql issue

背景信息:最近我们将 airflow 从 2.10.14 升级到 2.1.3,pgbouncer 使用从 azure microsoft image 构建的自定义容器(mcr.microsoft .com/azure-oss-db-tools/pgbouncer-sidecar:最新).

定制的 pgbouncer 停止工作,它现在连接到主 postgresql 服务器。

所以我现在尝试使用 airflow 2.1.3 (helm chart 8.5.2) 部署的 pgbouncer 来代替 (https://artifacthub.io/packages/helm/airflow-helm/airflow/8.5.0#how-to-use-an-external-database),有问题

以下是关键信息

在我的values.yaml文件中,关键信息如下

pgbouncer:
  enabled: true
  # listen_port does not seem to take effect into pgbouncer.ini file
#  listen_port: 5432

externalDatabase:
  type: postgres
  host: psql-hostname.postgres.database.azure.com
  port: 5432
  database: airflow
  user: username@psql-hostname
  passwordSecret: "airflow-postgres-redis-name"
  passwordSecretKey: "postgresql-password-key-name"
  properties: ""
  # properties: "?sslmode=disable"
externalRedis:
  host: redis-hostname.redis.cache.windows.net
  port: 6379
  databaseNumber: 1
  passwordSecret: "airflow-postgres-redis-name"
  passwordSecretKey: "redis-password-key-name"
  properties: ""

在我的脚本中,在kubernetes集群下面创建

kubectl create secret generic "airflow-postgres-redis-name" \
   -n ${_namespace_airflow} \
    --from-literal=postgresql-password="${my-airflow2-postgre}" \
    --from-literal=redis-password="${my-airflow2-redis}"

当我使用 helm upgrade 应用 values.yaml 时,我注意到 pgbouncer.ini 有以下信息。 注意 listen_port 是 6543

$ kubectl exec -n airflow -ti airflow-pgbouncer-6f88889bf5-xtdvp -- /bin/sh
~ $ ls /home/pgbouncer/

certs          config         pgbouncer.ini  users.txt
 
~ $ cat /home/pgbouncer/pgbouncer.ini

[databases]
* = host=127.0.0.1 port=5432
[pgbouncer]
pool_mode = session
listen_port = 6543
listen_addr = *
 
~ $ cat /home/pgbouncer/users.txt
 
"username@psql-hostname" "HIDE FOR THIS NOTE"

我怀疑原因是端口 6543 不工作,但我找不到覆盖它的方法。请帮忙。

或者如果我的怀疑是错误的,下面log/event或许也可以让你帮我出出主意试试

kubectl 的输出描述了 pod

Events:
  Type     Reason     Age                 From               Message
 
  ----     ------     ----                ----               -------
 
  Normal   Scheduled  15m                 default-scheduler  Successfully assigned airflow/airflow-pgbouncer-6f59cf4769-bx5hf to aks-nodepool1-16099970-vmss00000a
 
  Normal   Pulling    28m                 kubelet            Pulling image "ghcr.io/airflow-helm/pgbouncer:1.15.0-patch.0"
 
  Normal   Pulled     28m                 kubelet            Successfully pulled image "ghcr.io/airflow-helm/pgbouncer:1.15.0-patch.0" in 3.7505019s
 
  Normal   Created    23m (x4 over 28m)   kubelet            Created container pgbouncer
 
  Normal   Started    23m (x4 over 28m)   kubelet            Started container pgbouncer
 
  Normal   Killing    23m (x3 over 26m)   kubelet            Container pgbouncer failed liveness probe, will be restarted
 
  Normal   Pulled     23m (x3 over 26m)   kubelet            Container image "ghcr.io/airflow-helm/pgbouncer:1.15.0-patch.0" already present on machine
 
  Warning  Unhealthy  18m (x16 over 27m)  kubelet            Liveness probe failed: psql: error: ERROR:  pgbouncer cannot connect to server
 
ERROR:  pgbouncer cannot connect to server
 
  Warning  BackOff  13m (x14 over 15m)  kubelet  Back-off restarting failed container

pod的kubectl日志输出

$ go.kube.logs airflow-pgbouncer-6f59cf4769-bx5hf
Successfully generated auth_file: /home/pgbouncer/users.txt
 
2021-10-27 09:09:43.157 UTC [6] LOG kernel file descriptor limit: 1048576 (hard: 1048576); max_client_conn: 100, max expected fd use: 112
2021-10-27 09:09:43.157 UTC [6] LOG listening on 0.0.0.0:6432
2021-10-27 09:09:43.157 UTC [6] LOG listening on [::]:6432
2021-10-27 09:09:43.157 UTC [6] LOG listening on unix:/tmp/.s.PGSQL.6432
2021-10-27 09:09:43.157 UTC [6] LOG process up: PgBouncer 1.15.0, libevent 2.1.12-stable (epoll), adns: c-ares 1.17.1, tls: OpenSSL 1.1.1k  25 Mar 2021
2021-10-27 09:10:00.602 UTC [6] LOG C-0x7f16390c91b0: (nodb)/(nouser)@10.244.0.1:41595 registered new auto-database: db=airflow
2021-10-27 09:10:00.610 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:10:15.834 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:10:31.164 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:10:43.156 UTC [6] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2021-10-27 09:10:46.165 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:11:00.824 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:41595 pooler error: client_login_timeout (server down)
2021-10-27 09:11:00.824 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:17395 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:00.965 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:6755 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:00.966 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:24068 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:01.116 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:1107 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:01.117 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:43273 pooler error: pgbouncer cannot connect to server
 2021-10-27 09:11:30.617 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:11:30.620 UTC [6] LOG got SIGINT, shutting down
2021-10-27 09:11:30.823 UTC [6] LOG server connections dropped, exiting

注意:我用“username@psql-hostname”替换了真实的用户名

我们有 2 个选项来解决这个问题(注意,我们的气流图表是社区图表版本 8.5.2),我们选择了第一个选项。回想起来,选项 2 会更容易,并且几乎不需要更改,一旦下一个版本正确修复它。

  1. 鉴于community airflow chart version 8.5.2 built-in pgbouncer defaults the auth type to a fixed value, which if the pgbouncer connects to azure postgresql single server, it will fail,可以选择not use8.5.2版本图提供的pgbouncer,即pgbouncer=false,然后部署自己的pgbouncer(使用helm and kubecetl 等),并在气流 values.yaml 文件中将 externalDatabase 主机指向 pgbouncer 服务。我们选择了这种方法:
$ helm repo add cradlepoint https://raw.githubusercontent.com/cradlepoint/kubernetes-helm-chart-pgbouncer/master/repos/stable --force-update
$ helm upgrade --install pgbouncer cradlepoint/pgbouncer -n ${_namespace_airflow} -f ${some_path}/values.pgbouncer.yaml

$ service_pgbouncer=$(kubectl get services -n airflow |grep pgbouncer |awk '{print }')
$ echo "use this name: '${service_pgbouncer}' in values.yaml for airflow externalDatabase"

您可以让 values.pgbouncer.yaml 为 azure postgresql 身份验证类型工作。例如trust(这是我们使用 azure side car image pgbouncer 时的值)。 对于为什么我们不能使用 azure side-car pgbouncer, 看: https://github.com/airflow-helm/charts/issues/464#issuecomment-973811581

  1. 仍然使用 pgbouncer 中内置的 airflow community chart 8.5.2 版本,但使用不同的方法部署图表。 (基本上在本地修复图表 pgbouncer 硬编码 auth_type 问题,并从本地固定副本部署图表)。请参阅以下 2 个对话:

上面的“974957815”评论是我意识到我本可以做到的。