无法将 pod 间亲和力应用于 Airflow 调度程序

Cannot apply inter-pod affinity to Airflow scheduler

当我尝试将 podAffinity 附加到来自官方 Airflow helm 图表 的 调度程序部署时,我遇到了一个奇怪的行为,例如:

  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app 
            operator: In
            values:
            - postgresql
        topologyKey: "kubernetes.io/hostname"

使用示例部署,podAffinity 应该“连接”到:

metadata:
  name: {{ template "postgresql.fullname" . }}
  labels:
    app: postgresql
    chart: {{ template "postgresql.chart" . }}
    release: {{ .Release.Name | quote }}
    heritage: {{ .Release.Service | quote }}
spec:
  serviceName: {{ template "postgresql.fullname" . }}-headless
  replicas: 1
  selector:
    matchLabels:
      app: postgresql
      release: {{ .Release.Name | quote }}
  template:
    metadata:
      name: {{ template "postgresql.fullname" . }}
      labels:
        app: postgresql
        chart: {{ template "postgresql.chart" . }}

这导致:

NotTriggerScaleUp: pod didn't trigger scale-up: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod affinity rules

但是,将相同的 podAffinity 配置应用于 Web 服务器部署工作得很好。另外,将示例 Deployment 更改为 vanilla nginx 会在结果中体现出来。

这似乎不是任何资源限制问题,因为我已经尝试了各种配置,每次都得到相同的结果。 除了节点关联之外,我不使用任何自定义配置。

有没有人遇到同样的情况或者知道我可能做错了什么?

设置:

  • AKS 集群
  • Airflow 舵图 1.1.0
  • Airflow 1.10.15(但我认为这不重要)
  • kubectl 客户端 (1.22.1) 和服务器 (1.20.7)

Airflow 图表链接:

我已经在我的 GKE 集群上重新创建了这个场景,我决定提供一个社区 Wiki 答案来表明 podAffinity on the Scheduler 按预期工作。 我将在下面逐步描述我是如何测试它的。


  1. values.yaml 文件中,我将 podAffinity 配置如下:
$ cat values.yaml
...
# Airflow scheduler settings
scheduler:  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - postgresql
        topologyKey: "kubernetes.io/hostname"
...
  1. 我已经使用 Helm 包管理器和指定的 values.yaml 文件在 Kubernetes 集群上安装了 Airflow
$ helm install airflow apache-airflow/airflow --values values.yaml

稍后我们可以查看 scheduler:

的状态
$ kubectl get pods -owide | grep "scheduler"
airflow-scheduler-79bfb664cc-7n68f   0/2     Pending   0          8m6s   <none>      <none>                                 <none>           <none>
  1. 我创建了一个带有 app: postgresql 标签的示例部署:
$ cat test.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: postgresql
  name: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgresql
  template:
    metadata:
      labels:
        app: postgresql
    spec:
      containers:
      - image: nginx
        name: nginx
        
$ kubectl apply -f test.yaml
deployment.apps/test created

$ kubectl get pods --show-labels | grep test
test-7d4c9c654-7lqns                 1/1     Running   0          2m   app=postgresql,...
  1. 最后,我们可以检查 scheduler 是否已成功创建:
$ kubectl get pods -o wide | grep "scheduler\|test"
airflow-scheduler-79bfb664cc-7n68f   2/2     Running   0          14m     10.X.1.6    nodeA     
test-7d4c9c654-7lqns                 1/1     Running   0          2m27s   10.X.1.5    nodeA

此外,有关 pod affinitypod anti-affinity 的详细信息可以在 Understanding pod affinity 文档中找到:

Pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key/value labels on other pods.

Pod affinity can tell the scheduler to locate a new pod on the same node as other pods if the label selector on the new pod matches the label on the current pod.

Pod anti-affinity can prevent the scheduler from locating a new pod on the same node as pods with the same labels if the label selector on the new pod matches the label on the current pod.