airflow 2.1.3 使用 pgbouncer 解决 postgresql 问题
airflow 2.1.3 using pgbouncer for postgresql issue
背景信息:最近我们将 airflow 从 2.10.14 升级到 2.1.3,pgbouncer 使用从 azure microsoft image 构建的自定义容器(mcr.microsoft .com/azure-oss-db-tools/pgbouncer-sidecar:最新).
定制的 pgbouncer 停止工作,它现在连接到主 postgresql 服务器。
所以我现在尝试使用 airflow 2.1.3 (helm chart 8.5.2) 部署的 pgbouncer 来代替 (https://artifacthub.io/packages/helm/airflow-helm/airflow/8.5.0#how-to-use-an-external-database),有问题
以下是关键信息
在我的values.yaml文件中,关键信息如下
pgbouncer:
enabled: true
# listen_port does not seem to take effect into pgbouncer.ini file
# listen_port: 5432
externalDatabase:
type: postgres
host: psql-hostname.postgres.database.azure.com
port: 5432
database: airflow
user: username@psql-hostname
passwordSecret: "airflow-postgres-redis-name"
passwordSecretKey: "postgresql-password-key-name"
properties: ""
# properties: "?sslmode=disable"
externalRedis:
host: redis-hostname.redis.cache.windows.net
port: 6379
databaseNumber: 1
passwordSecret: "airflow-postgres-redis-name"
passwordSecretKey: "redis-password-key-name"
properties: ""
在我的脚本中,在kubernetes集群下面创建
kubectl create secret generic "airflow-postgres-redis-name" \
-n ${_namespace_airflow} \
--from-literal=postgresql-password="${my-airflow2-postgre}" \
--from-literal=redis-password="${my-airflow2-redis}"
当我使用 helm upgrade 应用 values.yaml 时,我注意到 pgbouncer.ini 有以下信息。
注意 listen_port 是 6543
$ kubectl exec -n airflow -ti airflow-pgbouncer-6f88889bf5-xtdvp -- /bin/sh
~ $ ls /home/pgbouncer/
certs config pgbouncer.ini users.txt
~ $ cat /home/pgbouncer/pgbouncer.ini
[databases]
* = host=127.0.0.1 port=5432
[pgbouncer]
pool_mode = session
listen_port = 6543
listen_addr = *
~ $ cat /home/pgbouncer/users.txt
"username@psql-hostname" "HIDE FOR THIS NOTE"
我怀疑原因是端口 6543 不工作,但我找不到覆盖它的方法。请帮忙。
或者如果我的怀疑是错误的,下面log/event或许也可以让你帮我出出主意试试
kubectl 的输出描述了 pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15m default-scheduler Successfully assigned airflow/airflow-pgbouncer-6f59cf4769-bx5hf to aks-nodepool1-16099970-vmss00000a
Normal Pulling 28m kubelet Pulling image "ghcr.io/airflow-helm/pgbouncer:1.15.0-patch.0"
Normal Pulled 28m kubelet Successfully pulled image "ghcr.io/airflow-helm/pgbouncer:1.15.0-patch.0" in 3.7505019s
Normal Created 23m (x4 over 28m) kubelet Created container pgbouncer
Normal Started 23m (x4 over 28m) kubelet Started container pgbouncer
Normal Killing 23m (x3 over 26m) kubelet Container pgbouncer failed liveness probe, will be restarted
Normal Pulled 23m (x3 over 26m) kubelet Container image "ghcr.io/airflow-helm/pgbouncer:1.15.0-patch.0" already present on machine
Warning Unhealthy 18m (x16 over 27m) kubelet Liveness probe failed: psql: error: ERROR: pgbouncer cannot connect to server
ERROR: pgbouncer cannot connect to server
Warning BackOff 13m (x14 over 15m) kubelet Back-off restarting failed container
pod的kubectl日志输出
$ go.kube.logs airflow-pgbouncer-6f59cf4769-bx5hf
Successfully generated auth_file: /home/pgbouncer/users.txt
2021-10-27 09:09:43.157 UTC [6] LOG kernel file descriptor limit: 1048576 (hard: 1048576); max_client_conn: 100, max expected fd use: 112
2021-10-27 09:09:43.157 UTC [6] LOG listening on 0.0.0.0:6432
2021-10-27 09:09:43.157 UTC [6] LOG listening on [::]:6432
2021-10-27 09:09:43.157 UTC [6] LOG listening on unix:/tmp/.s.PGSQL.6432
2021-10-27 09:09:43.157 UTC [6] LOG process up: PgBouncer 1.15.0, libevent 2.1.12-stable (epoll), adns: c-ares 1.17.1, tls: OpenSSL 1.1.1k 25 Mar 2021
2021-10-27 09:10:00.602 UTC [6] LOG C-0x7f16390c91b0: (nodb)/(nouser)@10.244.0.1:41595 registered new auto-database: db=airflow
2021-10-27 09:10:00.610 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:10:15.834 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:10:31.164 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:10:43.156 UTC [6] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2021-10-27 09:10:46.165 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:11:00.824 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:41595 pooler error: client_login_timeout (server down)
2021-10-27 09:11:00.824 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:17395 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:00.965 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:6755 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:00.966 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:24068 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:01.116 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:1107 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:01.117 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:43273 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:30.617 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:11:30.620 UTC [6] LOG got SIGINT, shutting down
2021-10-27 09:11:30.823 UTC [6] LOG server connections dropped, exiting
注意:我用“username@psql-hostname”替换了真实的用户名
我们有 2 个选项来解决这个问题(注意,我们的气流图表是社区图表版本 8.5.2),我们选择了第一个选项。回想起来,选项 2 会更容易,并且几乎不需要更改,一旦下一个版本正确修复它。
- 鉴于
community airflow chart version 8.5.2 built-in pgbouncer defaults the auth type to a fixed value, which if the pgbouncer connects to azure postgresql single server, it will fail
,可以选择not use
8.5.2版本图提供的pgbouncer,即pgbouncer=false
,然后部署自己的pgbouncer(使用helm and kubecetl
等),并在气流 values.yaml
文件中将 externalDatabase
主机指向 pgbouncer
服务。我们选择了这种方法:
$ helm repo add cradlepoint https://raw.githubusercontent.com/cradlepoint/kubernetes-helm-chart-pgbouncer/master/repos/stable --force-update
$ helm upgrade --install pgbouncer cradlepoint/pgbouncer -n ${_namespace_airflow} -f ${some_path}/values.pgbouncer.yaml
$ service_pgbouncer=$(kubectl get services -n airflow |grep pgbouncer |awk '{print }')
$ echo "use this name: '${service_pgbouncer}' in values.yaml for airflow externalDatabase"
您可以让 values.pgbouncer.yaml
为 azure postgresql 身份验证类型工作。例如trust
(这是我们使用 azure side car image pgbouncer 时的值)。
对于为什么我们不能使用 azure side-car pgbouncer,
看:
https://github.com/airflow-helm/charts/issues/464#issuecomment-973811581
- 仍然使用 pgbouncer 中内置的 airflow community chart 8.5.2 版本,但使用不同的方法部署图表。 (基本上在本地修复图表 pgbouncer 硬编码 auth_type 问题,并从本地固定副本部署图表)。请参阅以下 2 个对话:
- https://github.com/airflow-helm/charts/issues/412#issuecomment-974909150
- https://github.com/airflow-helm/charts/issues/412#issuecomment-974957815
上面的“974957815”评论是我意识到我本可以做到的。
背景信息:最近我们将 airflow 从 2.10.14 升级到 2.1.3,pgbouncer 使用从 azure microsoft image 构建的自定义容器(mcr.microsoft .com/azure-oss-db-tools/pgbouncer-sidecar:最新).
定制的 pgbouncer 停止工作,它现在连接到主 postgresql 服务器。
所以我现在尝试使用 airflow 2.1.3 (helm chart 8.5.2) 部署的 pgbouncer 来代替 (https://artifacthub.io/packages/helm/airflow-helm/airflow/8.5.0#how-to-use-an-external-database),有问题
以下是关键信息
在我的values.yaml文件中,关键信息如下
pgbouncer:
enabled: true
# listen_port does not seem to take effect into pgbouncer.ini file
# listen_port: 5432
externalDatabase:
type: postgres
host: psql-hostname.postgres.database.azure.com
port: 5432
database: airflow
user: username@psql-hostname
passwordSecret: "airflow-postgres-redis-name"
passwordSecretKey: "postgresql-password-key-name"
properties: ""
# properties: "?sslmode=disable"
externalRedis:
host: redis-hostname.redis.cache.windows.net
port: 6379
databaseNumber: 1
passwordSecret: "airflow-postgres-redis-name"
passwordSecretKey: "redis-password-key-name"
properties: ""
在我的脚本中,在kubernetes集群下面创建
kubectl create secret generic "airflow-postgres-redis-name" \
-n ${_namespace_airflow} \
--from-literal=postgresql-password="${my-airflow2-postgre}" \
--from-literal=redis-password="${my-airflow2-redis}"
当我使用 helm upgrade 应用 values.yaml 时,我注意到 pgbouncer.ini 有以下信息。 注意 listen_port 是 6543
$ kubectl exec -n airflow -ti airflow-pgbouncer-6f88889bf5-xtdvp -- /bin/sh
~ $ ls /home/pgbouncer/
certs config pgbouncer.ini users.txt
~ $ cat /home/pgbouncer/pgbouncer.ini
[databases]
* = host=127.0.0.1 port=5432
[pgbouncer]
pool_mode = session
listen_port = 6543
listen_addr = *
~ $ cat /home/pgbouncer/users.txt
"username@psql-hostname" "HIDE FOR THIS NOTE"
我怀疑原因是端口 6543 不工作,但我找不到覆盖它的方法。请帮忙。
或者如果我的怀疑是错误的,下面log/event或许也可以让你帮我出出主意试试
kubectl 的输出描述了 pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15m default-scheduler Successfully assigned airflow/airflow-pgbouncer-6f59cf4769-bx5hf to aks-nodepool1-16099970-vmss00000a
Normal Pulling 28m kubelet Pulling image "ghcr.io/airflow-helm/pgbouncer:1.15.0-patch.0"
Normal Pulled 28m kubelet Successfully pulled image "ghcr.io/airflow-helm/pgbouncer:1.15.0-patch.0" in 3.7505019s
Normal Created 23m (x4 over 28m) kubelet Created container pgbouncer
Normal Started 23m (x4 over 28m) kubelet Started container pgbouncer
Normal Killing 23m (x3 over 26m) kubelet Container pgbouncer failed liveness probe, will be restarted
Normal Pulled 23m (x3 over 26m) kubelet Container image "ghcr.io/airflow-helm/pgbouncer:1.15.0-patch.0" already present on machine
Warning Unhealthy 18m (x16 over 27m) kubelet Liveness probe failed: psql: error: ERROR: pgbouncer cannot connect to server
ERROR: pgbouncer cannot connect to server
Warning BackOff 13m (x14 over 15m) kubelet Back-off restarting failed container
pod的kubectl日志输出
$ go.kube.logs airflow-pgbouncer-6f59cf4769-bx5hf
Successfully generated auth_file: /home/pgbouncer/users.txt
2021-10-27 09:09:43.157 UTC [6] LOG kernel file descriptor limit: 1048576 (hard: 1048576); max_client_conn: 100, max expected fd use: 112
2021-10-27 09:09:43.157 UTC [6] LOG listening on 0.0.0.0:6432
2021-10-27 09:09:43.157 UTC [6] LOG listening on [::]:6432
2021-10-27 09:09:43.157 UTC [6] LOG listening on unix:/tmp/.s.PGSQL.6432
2021-10-27 09:09:43.157 UTC [6] LOG process up: PgBouncer 1.15.0, libevent 2.1.12-stable (epoll), adns: c-ares 1.17.1, tls: OpenSSL 1.1.1k 25 Mar 2021
2021-10-27 09:10:00.602 UTC [6] LOG C-0x7f16390c91b0: (nodb)/(nouser)@10.244.0.1:41595 registered new auto-database: db=airflow
2021-10-27 09:10:00.610 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:10:15.834 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:10:31.164 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:10:43.156 UTC [6] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2021-10-27 09:10:46.165 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:11:00.824 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:41595 pooler error: client_login_timeout (server down)
2021-10-27 09:11:00.824 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:17395 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:00.965 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:6755 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:00.966 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:24068 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:01.116 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:1107 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:01.117 UTC [6] WARNING C-0x7f16390c91b0: airflow/username@psql-hostname@10.244.0.1:43273 pooler error: pgbouncer cannot connect to server
2021-10-27 09:11:30.617 UTC [6] WARNING TLS handshake error: handshake failed: error:27069065:OCSP routines:OCSP_basic_verify:certificate verify error
2021-10-27 09:11:30.620 UTC [6] LOG got SIGINT, shutting down
2021-10-27 09:11:30.823 UTC [6] LOG server connections dropped, exiting
注意:我用“username@psql-hostname”替换了真实的用户名
我们有 2 个选项来解决这个问题(注意,我们的气流图表是社区图表版本 8.5.2),我们选择了第一个选项。回想起来,选项 2 会更容易,并且几乎不需要更改,一旦下一个版本正确修复它。
- 鉴于
community airflow chart version 8.5.2 built-in pgbouncer defaults the auth type to a fixed value, which if the pgbouncer connects to azure postgresql single server, it will fail
,可以选择not use
8.5.2版本图提供的pgbouncer,即pgbouncer=false
,然后部署自己的pgbouncer(使用helm and kubecetl
等),并在气流values.yaml
文件中将externalDatabase
主机指向pgbouncer
服务。我们选择了这种方法:
$ helm repo add cradlepoint https://raw.githubusercontent.com/cradlepoint/kubernetes-helm-chart-pgbouncer/master/repos/stable --force-update
$ helm upgrade --install pgbouncer cradlepoint/pgbouncer -n ${_namespace_airflow} -f ${some_path}/values.pgbouncer.yaml
$ service_pgbouncer=$(kubectl get services -n airflow |grep pgbouncer |awk '{print }')
$ echo "use this name: '${service_pgbouncer}' in values.yaml for airflow externalDatabase"
您可以让 values.pgbouncer.yaml
为 azure postgresql 身份验证类型工作。例如trust
(这是我们使用 azure side car image pgbouncer 时的值)。
对于为什么我们不能使用 azure side-car pgbouncer,
看:
https://github.com/airflow-helm/charts/issues/464#issuecomment-973811581
- 仍然使用 pgbouncer 中内置的 airflow community chart 8.5.2 版本,但使用不同的方法部署图表。 (基本上在本地修复图表 pgbouncer 硬编码 auth_type 问题,并从本地固定副本部署图表)。请参阅以下 2 个对话:
- https://github.com/airflow-helm/charts/issues/412#issuecomment-974909150
- https://github.com/airflow-helm/charts/issues/412#issuecomment-974957815
上面的“974957815”评论是我意识到我本可以做到的。