无法将 pod 间亲和力应用于 Airflow 调度程序
Cannot apply inter-pod affinity to Airflow scheduler
当我尝试将 podAffinity
附加到来自官方 Airflow helm 图表 的 调度程序部署时,我遇到了一个奇怪的行为,例如:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- postgresql
topologyKey: "kubernetes.io/hostname"
使用示例部署,podAffinity
应该“连接”到:
metadata:
name: {{ template "postgresql.fullname" . }}
labels:
app: postgresql
chart: {{ template "postgresql.chart" . }}
release: {{ .Release.Name | quote }}
heritage: {{ .Release.Service | quote }}
spec:
serviceName: {{ template "postgresql.fullname" . }}-headless
replicas: 1
selector:
matchLabels:
app: postgresql
release: {{ .Release.Name | quote }}
template:
metadata:
name: {{ template "postgresql.fullname" . }}
labels:
app: postgresql
chart: {{ template "postgresql.chart" . }}
这导致:
NotTriggerScaleUp: pod didn't trigger scale-up: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod affinity rules
但是,将相同的 podAffinity
配置应用于 Web 服务器部署工作得很好。另外,将示例 Deployment 更改为 vanilla nginx 会在结果中体现出来。
这似乎不是任何资源限制问题,因为我已经尝试了各种配置,每次都得到相同的结果。
除了节点关联之外,我不使用任何自定义配置。
有没有人遇到同样的情况或者知道我可能做错了什么?
设置:
- AKS 集群
- Airflow 舵图 1.1.0
- Airflow 1.10.15(但我认为这不重要)
- kubectl 客户端 (1.22.1) 和服务器 (1.20.7)
Airflow 图表链接:
我已经在我的 GKE 集群上重新创建了这个场景,我决定提供一个社区 Wiki 答案来表明 podAffinity on the Scheduler 按预期工作。
我将在下面逐步描述我是如何测试它的。
- 在
values.yaml
文件中,我将 podAffinity
配置如下:
$ cat values.yaml
...
# Airflow scheduler settings
scheduler: affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- postgresql
topologyKey: "kubernetes.io/hostname"
...
- 我已经使用 Helm 包管理器和指定的
values.yaml
文件在 Kubernetes 集群上安装了 Airflow。
$ helm install airflow apache-airflow/airflow --values values.yaml
稍后我们可以查看 scheduler
:
的状态
$ kubectl get pods -owide | grep "scheduler"
airflow-scheduler-79bfb664cc-7n68f 0/2 Pending 0 8m6s <none> <none> <none> <none>
- 我创建了一个带有
app: postgresql
标签的示例部署:
$ cat test.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: postgresql
name: test
spec:
replicas: 1
selector:
matchLabels:
app: postgresql
template:
metadata:
labels:
app: postgresql
spec:
containers:
- image: nginx
name: nginx
$ kubectl apply -f test.yaml
deployment.apps/test created
$ kubectl get pods --show-labels | grep test
test-7d4c9c654-7lqns 1/1 Running 0 2m app=postgresql,...
- 最后,我们可以检查
scheduler
是否已成功创建:
$ kubectl get pods -o wide | grep "scheduler\|test"
airflow-scheduler-79bfb664cc-7n68f 2/2 Running 0 14m 10.X.1.6 nodeA
test-7d4c9c654-7lqns 1/1 Running 0 2m27s 10.X.1.5 nodeA
此外,有关 pod affinity
和 pod anti-affinity
的详细信息可以在 Understanding pod affinity 文档中找到:
Pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key/value labels on other pods.
Pod affinity can tell the scheduler to locate a new pod on the same node as other pods if the label selector on the new pod matches the label on the current pod.
Pod anti-affinity can prevent the scheduler from locating a new pod on the same node as pods with the same labels if the label selector on the new pod matches the label on the current pod.
当我尝试将 podAffinity
附加到来自官方 Airflow helm 图表 的 调度程序部署时,我遇到了一个奇怪的行为,例如:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- postgresql
topologyKey: "kubernetes.io/hostname"
使用示例部署,podAffinity
应该“连接”到:
metadata:
name: {{ template "postgresql.fullname" . }}
labels:
app: postgresql
chart: {{ template "postgresql.chart" . }}
release: {{ .Release.Name | quote }}
heritage: {{ .Release.Service | quote }}
spec:
serviceName: {{ template "postgresql.fullname" . }}-headless
replicas: 1
selector:
matchLabels:
app: postgresql
release: {{ .Release.Name | quote }}
template:
metadata:
name: {{ template "postgresql.fullname" . }}
labels:
app: postgresql
chart: {{ template "postgresql.chart" . }}
这导致:
NotTriggerScaleUp: pod didn't trigger scale-up: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't match pod affinity rules
但是,将相同的 podAffinity
配置应用于 Web 服务器部署工作得很好。另外,将示例 Deployment 更改为 vanilla nginx 会在结果中体现出来。
这似乎不是任何资源限制问题,因为我已经尝试了各种配置,每次都得到相同的结果。 除了节点关联之外,我不使用任何自定义配置。
有没有人遇到同样的情况或者知道我可能做错了什么?
设置:
- AKS 集群
- Airflow 舵图 1.1.0
- Airflow 1.10.15(但我认为这不重要)
- kubectl 客户端 (1.22.1) 和服务器 (1.20.7)
Airflow 图表链接:
我已经在我的 GKE 集群上重新创建了这个场景,我决定提供一个社区 Wiki 答案来表明 podAffinity on the Scheduler 按预期工作。 我将在下面逐步描述我是如何测试它的。
- 在
values.yaml
文件中,我将podAffinity
配置如下:
$ cat values.yaml
...
# Airflow scheduler settings
scheduler: affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- postgresql
topologyKey: "kubernetes.io/hostname"
...
- 我已经使用 Helm 包管理器和指定的
values.yaml
文件在 Kubernetes 集群上安装了 Airflow。
$ helm install airflow apache-airflow/airflow --values values.yaml
稍后我们可以查看 scheduler
:
$ kubectl get pods -owide | grep "scheduler"
airflow-scheduler-79bfb664cc-7n68f 0/2 Pending 0 8m6s <none> <none> <none> <none>
- 我创建了一个带有
app: postgresql
标签的示例部署:
$ cat test.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: postgresql
name: test
spec:
replicas: 1
selector:
matchLabels:
app: postgresql
template:
metadata:
labels:
app: postgresql
spec:
containers:
- image: nginx
name: nginx
$ kubectl apply -f test.yaml
deployment.apps/test created
$ kubectl get pods --show-labels | grep test
test-7d4c9c654-7lqns 1/1 Running 0 2m app=postgresql,...
- 最后,我们可以检查
scheduler
是否已成功创建:
$ kubectl get pods -o wide | grep "scheduler\|test"
airflow-scheduler-79bfb664cc-7n68f 2/2 Running 0 14m 10.X.1.6 nodeA
test-7d4c9c654-7lqns 1/1 Running 0 2m27s 10.X.1.5 nodeA
此外,有关 pod affinity
和 pod anti-affinity
的详细信息可以在 Understanding pod affinity 文档中找到:
Pod affinity and pod anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled on based on the key/value labels on other pods.
Pod affinity can tell the scheduler to locate a new pod on the same node as other pods if the label selector on the new pod matches the label on the current pod.
Pod anti-affinity can prevent the scheduler from locating a new pod on the same node as pods with the same labels if the label selector on the new pod matches the label on the current pod.