Airflow - 为什么 externalDatabase 配置会破坏 helm 升级?

Airflow - Why is the externalDatabase configuration breaking helm upgrade?

我正在尝试使用 Helm 图表部署 Airflow,用于个人 POC,但我在部署方面遇到了一些问题,找不到解决我的问题的明确说明 - 这就是我在这里寻求帮助的原因。

问题的背景

首先介绍一下 POC 的背景知识 - 我想部署一个托管 airflow 的 K8S 集群,将其连接到托管 dags 的 git 存储库 并从 K8S Airflow 外部托管元存储和缓存

我已经使用 kind 和 Airflow 的默认 helm chart 成功地将 Airflow 部署到本地 Kubernetes 集群。在helm chart上我指定了要使用的executor模式必须是KubernetesExecutor.

我还配置了 Airflow 来同步 DAG to/from 一个 bitbucket 存储库。

问题和当前实施

我在将 Airflow 与外部服务连接时遇到问题 - 我创建了一个 Azure PostgreSQL 服务器,创建了一个气流数据库,并在 psql 上创建了一个管理员用户,如下所示:

CREATE DATABASE airflow;
CREATE USER aflw_admin WITH PASSWORD 'some_password';
GRANT ALL PRIVILEGES ON DATABASE airflow TO aflw_admin;
ALTER USER aflw_admin SET search_path = public;

因为我是用helm部署的,所以我的values.yaml如下:

postgresql:
  enabled: false 

externalDatabase:
  type: postgres
  host: dbname.postgres.database.azure.com
  port: 5432
  database: airflow
  user: aflw_admin
  passwordSecretKey: "postgresql-password"

data:
  metadataSecretName: ~
  resultBackendSecretName: ~

  metadataConnection:
    user: aflw_admin 
    pass:  some_password
    protocol: postgresql
    host: dbname.postgres.database.azure.com
    port: 5432
    db: airflow 
    sslmode: require 
    
  resultBackendConnection:
    user: aflw_admin 
    pass:  some_password
    protocol: postgresql
    host: dbname.postgres.database.azure.com
    port: 5432
    db: airflow 
    sslmode: require 

秘密 postgresql-password 由以下人员创建:

kubectl create secret generic airflow-postgresql --from-literal=postgresql-password=$(openssl rand -base64 13) --namespace airflow

我使用以下方法部署了解决方案:

kubectl apply -f ./helm/variables.yaml
helm upgrade --install airflow apache-airflow/airflow -n airflow -f ./values.yaml --debug 

我尝试过的方法和问题详情

经过一些来回,我发现通过恢复配置 - 也就是将 postgresql 启用设置为 true 并删除 metadataConnectionresultBackendConnectionvalues.yaml 文件的 externalDatabase 部分 - 我可以成功部署 postgres 服务,但 权衡 postgresql 不是外部服务 ,这至少有助于部分隔离问题。

所以,如果我回到初始配置并尝试部署它,我得到的结果是:

这是有问题的 helm 部署的日志:

history.go:56: [debug] getting history for release airflow
upgrade.go:142: [debug] preparing upgrade for airflow
upgrade.go:150: [debug] performing update for airflow
upgrade.go:322: [debug] creating upgraded release for airflow
client.go:218: [debug] checking 20 resources for changes
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-create-user-job"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-migrate-database-job"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-scheduler"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-statsd"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-triggerer"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-webserver"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-worker"
client.go:501: [debug] Looks like there are no changes for Secret "airflow-airflow-metadata"
client.go:501: [debug] Looks like there are no changes for Secret "airflow-webserver-secret-key"
client.go:501: [debug] Looks like there are no changes for ConfigMap "airflow-airflow-config"
client.go:501: [debug] Looks like there are no changes for Role "airflow-pod-launcher-role"
client.go:501: [debug] Looks like there are no changes for Role "airflow-pod-log-reader-role"
client.go:501: [debug] Looks like there are no changes for RoleBinding "airflow-pod-launcher-rolebinding"
client.go:501: [debug] Looks like there are no changes for RoleBinding "airflow-pod-log-reader-rolebinding"
client.go:501: [debug] Looks like there are no changes for Service "airflow-statsd"
client.go:501: [debug] Looks like there are no changes for Service "airflow-webserver"
client.go:510: [debug] Patch Deployment "airflow-scheduler" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-statsd" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-triggerer" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-webserver" in namespace airflow
client.go:267: [debug] Deleting Secret "airflow-postgresql" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql", err: secrets "airflow-postgresql" not found
client.go:267: [debug] Deleting Service "airflow-postgresql-headless" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql-headless", err: services "airflow-postgresql-headless" not found
client.go:267: [debug] Deleting Service "airflow-postgresql" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql", err: services "airflow-postgresql" not found
client.go:267: [debug] Deleting StatefulSet "airflow-postgresql" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql", err: statefulsets.apps "airflow-postgresql" not found
client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:128: [debug] creating 1 resource(s)
client.go:529: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 20m0s
client.go:557: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:596: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:596: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
upgrade.go:433: [debug] warning: Upgrade "airflow" failed: post-upgrade hooks failed: job failed: BackoffLimitExceeded
Error: UPGRADE FAILED: post-upgrade hooks failed: job failed: BackoffLimitExceeded
helm.go:84: [debug] post-upgrade hooks failed: job failed: BackoffLimitExceeded
UPGRADE FAILED
main.newUpgradeCmd.func2
        helm.sh/helm/v3/cmd/helm/upgrade.go:199
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/cobra@v1.3.0/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/cobra@v1.3.0/command.go:974
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/cobra@v1.3.0/command.go:902
main.main
        helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
        runtime/proc.go:255
runtime.goexit
        runtime/asm_amd64.s:1581
make: *** [Makefile:46: deploy-airflow] Error 1

这种行为让我认为这是某种配置错误,但我无法确定是什么。

我的 helm chart 中有什么配置错误可能会破坏 helm 升级?

helm/airflow/psql 的版本如下:

有时很难诊断此类问题,因为有太多活动部件。虽然,我已经在 Azure AKS(Postgres sslmode:需要)和 AWS EKS(RDS Postgres sslmode:禁用)上设置了 Airflow,但每个都有自己的问题。

也许删除 externalDatabaseresultBackendConnection 的配置。为什么? - 因为如果未配置 resultBackendConnection 将使用 metadataConnection。我还没有在我当前的配置文件 v2.2.4 中看到 externalDatabase 键。您是否正在使用 -f values.yaml 以正确的 values.yaml 覆盖 helm install?

如果禁用 postgresql

postgresql:
  enabled: false

如你所愿,那么你需要为你的外部数据库配置 metadataConnection。

我只配置了 metadataSecretName 与 Postgres 的连接正常工作后。

另外,在 metadataConnection 配置中尝试 sslmode: disable

一旦我按照我想要的方式获得配置文件,我 un-installed Airflow 然后 re-installed:

  1. helm 删除气流
  2. 降低了气流分贝并且re-created它
  3. kubectl delete secrets [airflow-xxxxx-xxxxx] 中的所有secrets 命名空间,因为数据库迁移在 错误。
  4. kubectl delete pvc(然后确保 pv 也被删除)

在那之后我 re-installed 一切都很好,这不是很多工作,但确保我可以 re-deploy 使用正确的值。

哦,记得设置 PSQL 并确保您实际上可以从命令行连接作为额外检查。