Airflow - 为什么 externalDatabase 配置会破坏 helm 升级?
Airflow - Why is the externalDatabase configuration breaking helm upgrade?
我正在尝试使用 Helm 图表部署 Airflow,用于个人 POC,但我在部署方面遇到了一些问题,找不到解决我的问题的明确说明 - 这就是我在这里寻求帮助的原因。
问题的背景
首先介绍一下 POC 的背景知识 - 我想部署一个托管 airflow 的 K8S 集群,将其连接到托管 dags 的 git 存储库 并从 K8S Airflow 外部托管元存储和缓存。
我已经使用 kind 和 Airflow 的默认 helm chart 成功地将 Airflow 部署到本地 Kubernetes 集群。在helm chart上我指定了要使用的executor模式必须是KubernetesExecutor
.
我还配置了 Airflow 来同步 DAG to/from 一个 bitbucket 存储库。
问题和当前实施
我在将 Airflow 与外部服务连接时遇到问题 - 我创建了一个 Azure PostgreSQL 服务器,创建了一个气流数据库,并在 psql 上创建了一个管理员用户,如下所示:
CREATE DATABASE airflow;
CREATE USER aflw_admin WITH PASSWORD 'some_password';
GRANT ALL PRIVILEGES ON DATABASE airflow TO aflw_admin;
ALTER USER aflw_admin SET search_path = public;
因为我是用helm部署的,所以我的values.yaml
如下:
postgresql:
enabled: false
externalDatabase:
type: postgres
host: dbname.postgres.database.azure.com
port: 5432
database: airflow
user: aflw_admin
passwordSecretKey: "postgresql-password"
data:
metadataSecretName: ~
resultBackendSecretName: ~
metadataConnection:
user: aflw_admin
pass: some_password
protocol: postgresql
host: dbname.postgres.database.azure.com
port: 5432
db: airflow
sslmode: require
resultBackendConnection:
user: aflw_admin
pass: some_password
protocol: postgresql
host: dbname.postgres.database.azure.com
port: 5432
db: airflow
sslmode: require
秘密 postgresql-password
由以下人员创建:
kubectl create secret generic airflow-postgresql --from-literal=postgresql-password=$(openssl rand -base64 13) --namespace airflow
我使用以下方法部署了解决方案:
kubectl apply -f ./helm/variables.yaml
helm upgrade --install airflow apache-airflow/airflow -n airflow -f ./values.yaml --debug
我尝试过的方法和问题详情
经过一些来回,我发现通过恢复配置 - 也就是将 postgresql
启用设置为 true
并删除 metadataConnection
、resultBackendConnection
和values.yaml
文件的 externalDatabase
部分 - 我可以成功部署 postgres 服务,但 权衡 postgresql 不是外部服务 ,这至少有助于部分隔离问题。
所以,如果我回到初始配置并尝试部署它,我得到的结果是:
- 首先我得到一个超时 --> 为了面对这个问题,我自然地将超时持续时间增加到一个更大的值,比如
20m0s
;
- 在我增加超时后我得到一个错误
BackoffLimitExceeded
并且没有部署任何东西。
这是有问题的 helm 部署的日志:
history.go:56: [debug] getting history for release airflow
upgrade.go:142: [debug] preparing upgrade for airflow
upgrade.go:150: [debug] performing update for airflow
upgrade.go:322: [debug] creating upgraded release for airflow
client.go:218: [debug] checking 20 resources for changes
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-create-user-job"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-migrate-database-job"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-scheduler"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-statsd"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-triggerer"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-webserver"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-worker"
client.go:501: [debug] Looks like there are no changes for Secret "airflow-airflow-metadata"
client.go:501: [debug] Looks like there are no changes for Secret "airflow-webserver-secret-key"
client.go:501: [debug] Looks like there are no changes for ConfigMap "airflow-airflow-config"
client.go:501: [debug] Looks like there are no changes for Role "airflow-pod-launcher-role"
client.go:501: [debug] Looks like there are no changes for Role "airflow-pod-log-reader-role"
client.go:501: [debug] Looks like there are no changes for RoleBinding "airflow-pod-launcher-rolebinding"
client.go:501: [debug] Looks like there are no changes for RoleBinding "airflow-pod-log-reader-rolebinding"
client.go:501: [debug] Looks like there are no changes for Service "airflow-statsd"
client.go:501: [debug] Looks like there are no changes for Service "airflow-webserver"
client.go:510: [debug] Patch Deployment "airflow-scheduler" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-statsd" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-triggerer" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-webserver" in namespace airflow
client.go:267: [debug] Deleting Secret "airflow-postgresql" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql", err: secrets "airflow-postgresql" not found
client.go:267: [debug] Deleting Service "airflow-postgresql-headless" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql-headless", err: services "airflow-postgresql-headless" not found
client.go:267: [debug] Deleting Service "airflow-postgresql" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql", err: services "airflow-postgresql" not found
client.go:267: [debug] Deleting StatefulSet "airflow-postgresql" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql", err: statefulsets.apps "airflow-postgresql" not found
client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:128: [debug] creating 1 resource(s)
client.go:529: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 20m0s
client.go:557: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:596: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:596: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
upgrade.go:433: [debug] warning: Upgrade "airflow" failed: post-upgrade hooks failed: job failed: BackoffLimitExceeded
Error: UPGRADE FAILED: post-upgrade hooks failed: job failed: BackoffLimitExceeded
helm.go:84: [debug] post-upgrade hooks failed: job failed: BackoffLimitExceeded
UPGRADE FAILED
main.newUpgradeCmd.func2
helm.sh/helm/v3/cmd/helm/upgrade.go:199
github.com/spf13/cobra.(*Command).execute
github.com/spf13/cobra@v1.3.0/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/cobra@v1.3.0/command.go:974
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/cobra@v1.3.0/command.go:902
main.main
helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
runtime/proc.go:255
runtime.goexit
runtime/asm_amd64.s:1581
make: *** [Makefile:46: deploy-airflow] Error 1
这种行为让我认为这是某种配置错误,但我无法确定是什么。
我的 helm chart 中有什么配置错误可能会破坏 helm 升级?
helm/airflow/psql 的版本如下:
- 气流 -> apache/airflow:2.2.3
- Helm chart -> 版本为 1.4.0 (https://artifacthub.io/packages/helm/apache-airflow/airflow)
的默认图像
- PSQL(在 Azure 上) -> Azure Database for PostgreSQL 灵活服务器,PSQL 版本 13.4
有时很难诊断此类问题,因为有太多活动部件。虽然,我已经在 Azure AKS(Postgres sslmode:需要)和 AWS EKS(RDS Postgres sslmode:禁用)上设置了 Airflow,但每个都有自己的问题。
也许删除 externalDatabase 和 resultBackendConnection 的配置。为什么? - 因为如果未配置 resultBackendConnection 将使用 metadataConnection。我还没有在我当前的配置文件 v2.2.4 中看到 externalDatabase 键。您是否正在使用 -f values.yaml
以正确的 values.yaml 覆盖 helm install?
如果禁用 postgresql
postgresql:
enabled: false
如你所愿,那么你需要为你的外部数据库配置 metadataConnection。
我只配置了 metadataSecretName 与 Postgres 的连接正常工作后。
另外,在 metadataConnection 配置中尝试 sslmode: disable
。
一旦我按照我想要的方式获得配置文件,我 un-installed Airflow 然后 re-installed:
- helm 删除气流
- 降低了气流分贝并且re-created它
- kubectl delete secrets [airflow-xxxxx-xxxxx] 中的所有secrets
命名空间,因为数据库迁移在
错误。
- kubectl delete pvc(然后确保 pv 也被删除)
在那之后我 re-installed 一切都很好,这不是很多工作,但确保我可以 re-deploy 使用正确的值。
哦,记得设置 PSQL 并确保您实际上可以从命令行连接作为额外检查。
我正在尝试使用 Helm 图表部署 Airflow,用于个人 POC,但我在部署方面遇到了一些问题,找不到解决我的问题的明确说明 - 这就是我在这里寻求帮助的原因。
问题的背景
首先介绍一下 POC 的背景知识 - 我想部署一个托管 airflow 的 K8S 集群,将其连接到托管 dags 的 git 存储库 并从 K8S Airflow 外部托管元存储和缓存。
我已经使用 kind 和 Airflow 的默认 helm chart 成功地将 Airflow 部署到本地 Kubernetes 集群。在helm chart上我指定了要使用的executor模式必须是KubernetesExecutor
.
我还配置了 Airflow 来同步 DAG to/from 一个 bitbucket 存储库。
问题和当前实施
我在将 Airflow 与外部服务连接时遇到问题 - 我创建了一个 Azure PostgreSQL 服务器,创建了一个气流数据库,并在 psql 上创建了一个管理员用户,如下所示:
CREATE DATABASE airflow;
CREATE USER aflw_admin WITH PASSWORD 'some_password';
GRANT ALL PRIVILEGES ON DATABASE airflow TO aflw_admin;
ALTER USER aflw_admin SET search_path = public;
因为我是用helm部署的,所以我的values.yaml
如下:
postgresql:
enabled: false
externalDatabase:
type: postgres
host: dbname.postgres.database.azure.com
port: 5432
database: airflow
user: aflw_admin
passwordSecretKey: "postgresql-password"
data:
metadataSecretName: ~
resultBackendSecretName: ~
metadataConnection:
user: aflw_admin
pass: some_password
protocol: postgresql
host: dbname.postgres.database.azure.com
port: 5432
db: airflow
sslmode: require
resultBackendConnection:
user: aflw_admin
pass: some_password
protocol: postgresql
host: dbname.postgres.database.azure.com
port: 5432
db: airflow
sslmode: require
秘密 postgresql-password
由以下人员创建:
kubectl create secret generic airflow-postgresql --from-literal=postgresql-password=$(openssl rand -base64 13) --namespace airflow
我使用以下方法部署了解决方案:
kubectl apply -f ./helm/variables.yaml
helm upgrade --install airflow apache-airflow/airflow -n airflow -f ./values.yaml --debug
我尝试过的方法和问题详情
经过一些来回,我发现通过恢复配置 - 也就是将 postgresql
启用设置为 true
并删除 metadataConnection
、resultBackendConnection
和values.yaml
文件的 externalDatabase
部分 - 我可以成功部署 postgres 服务,但 权衡 postgresql 不是外部服务 ,这至少有助于部分隔离问题。
所以,如果我回到初始配置并尝试部署它,我得到的结果是:
- 首先我得到一个超时 --> 为了面对这个问题,我自然地将超时持续时间增加到一个更大的值,比如
20m0s
; - 在我增加超时后我得到一个错误
BackoffLimitExceeded
并且没有部署任何东西。
这是有问题的 helm 部署的日志:
history.go:56: [debug] getting history for release airflow
upgrade.go:142: [debug] preparing upgrade for airflow
upgrade.go:150: [debug] performing update for airflow
upgrade.go:322: [debug] creating upgraded release for airflow
client.go:218: [debug] checking 20 resources for changes
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-create-user-job"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-migrate-database-job"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-scheduler"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-statsd"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-triggerer"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-webserver"
client.go:501: [debug] Looks like there are no changes for ServiceAccount "airflow-worker"
client.go:501: [debug] Looks like there are no changes for Secret "airflow-airflow-metadata"
client.go:501: [debug] Looks like there are no changes for Secret "airflow-webserver-secret-key"
client.go:501: [debug] Looks like there are no changes for ConfigMap "airflow-airflow-config"
client.go:501: [debug] Looks like there are no changes for Role "airflow-pod-launcher-role"
client.go:501: [debug] Looks like there are no changes for Role "airflow-pod-log-reader-role"
client.go:501: [debug] Looks like there are no changes for RoleBinding "airflow-pod-launcher-rolebinding"
client.go:501: [debug] Looks like there are no changes for RoleBinding "airflow-pod-log-reader-rolebinding"
client.go:501: [debug] Looks like there are no changes for Service "airflow-statsd"
client.go:501: [debug] Looks like there are no changes for Service "airflow-webserver"
client.go:510: [debug] Patch Deployment "airflow-scheduler" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-statsd" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-triggerer" in namespace airflow
client.go:510: [debug] Patch Deployment "airflow-webserver" in namespace airflow
client.go:267: [debug] Deleting Secret "airflow-postgresql" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql", err: secrets "airflow-postgresql" not found
client.go:267: [debug] Deleting Service "airflow-postgresql-headless" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql-headless", err: services "airflow-postgresql-headless" not found
client.go:267: [debug] Deleting Service "airflow-postgresql" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql", err: services "airflow-postgresql" not found
client.go:267: [debug] Deleting StatefulSet "airflow-postgresql" in namespace airflow...
client.go:270: [debug] Unable to get obj "airflow-postgresql", err: statefulsets.apps "airflow-postgresql" not found
client.go:299: [debug] Starting delete for "airflow-run-airflow-migrations" Job
client.go:128: [debug] creating 1 resource(s)
client.go:529: [debug] Watching for changes to Job airflow-run-airflow-migrations with timeout of 20m0s
client.go:557: [debug] Add/Modify event for airflow-run-airflow-migrations: ADDED
client.go:596: [debug] airflow-run-airflow-migrations: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
client.go:596: [debug] airflow-run-airflow-migrations: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:557: [debug] Add/Modify event for airflow-run-airflow-migrations: MODIFIED
upgrade.go:433: [debug] warning: Upgrade "airflow" failed: post-upgrade hooks failed: job failed: BackoffLimitExceeded
Error: UPGRADE FAILED: post-upgrade hooks failed: job failed: BackoffLimitExceeded
helm.go:84: [debug] post-upgrade hooks failed: job failed: BackoffLimitExceeded
UPGRADE FAILED
main.newUpgradeCmd.func2
helm.sh/helm/v3/cmd/helm/upgrade.go:199
github.com/spf13/cobra.(*Command).execute
github.com/spf13/cobra@v1.3.0/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/cobra@v1.3.0/command.go:974
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/cobra@v1.3.0/command.go:902
main.main
helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
runtime/proc.go:255
runtime.goexit
runtime/asm_amd64.s:1581
make: *** [Makefile:46: deploy-airflow] Error 1
这种行为让我认为这是某种配置错误,但我无法确定是什么。
我的 helm chart 中有什么配置错误可能会破坏 helm 升级?
helm/airflow/psql 的版本如下:
- 气流 -> apache/airflow:2.2.3
- Helm chart -> 版本为 1.4.0 (https://artifacthub.io/packages/helm/apache-airflow/airflow) 的默认图像
- PSQL(在 Azure 上) -> Azure Database for PostgreSQL 灵活服务器,PSQL 版本 13.4
有时很难诊断此类问题,因为有太多活动部件。虽然,我已经在 Azure AKS(Postgres sslmode:需要)和 AWS EKS(RDS Postgres sslmode:禁用)上设置了 Airflow,但每个都有自己的问题。
也许删除 externalDatabase 和 resultBackendConnection 的配置。为什么? - 因为如果未配置 resultBackendConnection 将使用 metadataConnection。我还没有在我当前的配置文件 v2.2.4 中看到 externalDatabase 键。您是否正在使用 -f values.yaml
以正确的 values.yaml 覆盖 helm install?
如果禁用 postgresql
postgresql:
enabled: false
如你所愿,那么你需要为你的外部数据库配置 metadataConnection。
我只配置了 metadataSecretName 与 Postgres 的连接正常工作后。
另外,在 metadataConnection 配置中尝试 sslmode: disable
。
一旦我按照我想要的方式获得配置文件,我 un-installed Airflow 然后 re-installed:
- helm 删除气流
- 降低了气流分贝并且re-created它
- kubectl delete secrets [airflow-xxxxx-xxxxx] 中的所有secrets 命名空间,因为数据库迁移在 错误。
- kubectl delete pvc(然后确保 pv 也被删除)
在那之后我 re-installed 一切都很好,这不是很多工作,但确保我可以 re-deploy 使用正确的值。
哦,记得设置 PSQL 并确保您实际上可以从命令行连接作为额外检查。