nodeAffinity 和 nodeAntiAffinity 被忽略

nodeAffinity & nodeAntiAffinity are ignored

我在尝试将部署限制为上工作时遇到问题避免特定节点池以及nodeAffinity和nodeAntiAffinity好像没用。

无论出于何种原因,似乎无论我使用什么配置,Kubernetes 似乎都会在两个节点池之间随机调度。

看下面的配置,以及调度的结果

deployment.yaml 片段

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wordpress
  namespace: "test"
  labels:
    app: wordpress
    client: "test"
    product: hosted-wordpress
    version: v1
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  selector:
    matchLabels:
      app: wordpress
      client: "test"
  template:
    metadata:
      labels:
        app: wordpress
        client: "test"
        product: hosted-wordpress
        version: v1
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: doks.digitalocean.com/node-pool
                  operator: NotIn
                  values:
                  - infra

节点描述片段 注意标签,'doks.digitalocean.com/node-pool=infra'

kubectl describe node infra-3dmga

Name:               infra-3dmga
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=s-2vcpu-4gb
                    beta.kubernetes.io/os=linux
                    doks.digitalocean.com/node-id=67d84a52-8d08-4b19-87fe-1d837ba46eb6
                    doks.digitalocean.com/node-pool=infra
                    doks.digitalocean.com/node-pool-id=2e0f2a1d-fbfa-47e9-9136-c897e51c014a
                    doks.digitalocean.com/version=1.19.3-do.2
                    failure-domain.beta.kubernetes.io/region=tor1
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=infra-3dmga
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=s-2vcpu-4gb
                    region=tor1
                    topology.kubernetes.io/region=tor1
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.137.0.230
                    csi.volume.kubernetes.io/nodeid: {"dobs.csi.digitalocean.com":"222551559"}
                    io.cilium.network.ipv4-cilium-host: 10.244.0.139
                    io.cilium.network.ipv4-health-ip: 10.244.0.209
                    io.cilium.network.ipv4-pod-cidr: 10.244.0.128/25
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 20 Dec 2020 20:17:20 -0800
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  infra-3dmga
  AcquireTime:     <unset>
  RenewTime:       Fri, 12 Feb 2021 08:04:09 -0800

有时会导致

kubectl get po -n test -o wide

NAME                         READY   STATUS    RESTARTS   AGE   IP             NODE          NOMINATED NODE   READINESS GATES
wordpress-5bfcb6f44b-2j7kv   5/5     Running   0          1h   10.244.0.107   infra-3dmga   <none>           <none>

其他时候结果是

kubectl get po -n test -o wide

NAME                         READY   STATUS    RESTARTS   AGE   IP             NODE          NOMINATED NODE   READINESS GATES
wordpress-5bfcb6f44b-b42wj   5/5     Running   0          5m   10.244.0.107   clients-3dmem   <none>           <none>

我试过使用 nodeAntiAffinity 达到类似的效果。

最后,我什至尝试创建测试标签而不是使用 Digital Ocean 的内置标签,我得到了同样的效果(Affinity 似乎根本不适合我)。

我希望有人可以帮助我解决甚至指出我配置中的一个愚蠢错误,因为这个问题一直让我发疯试图解决它(它也是一个有用的功能,当它工作时) .

谢谢,

在部署文件中,您提到了 operator: NotIn 作为反亲和性工作。

请使用operator: In实现节点亲和。因此,例如,如果我们希望 pods 使用具有 clients 标签的节点。

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: wordpress
  namespace: "test"
  labels:
    app: wordpress
    client: "test"
    product: hosted-wordpress
    version: v1
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  selector:
    matchLabels:
      app: wordpress
      client: "test"
  template:
    metadata:
      labels:
        app: wordpress
        client: "test"
        product: hosted-wordpress
        version: v1
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: "doks.digitalocean.com/node-pool"
                  operator: In
                  values: ["clients"] ##Pls use correct label

好消息!

我终于解决了这个问题。

问题当然是“用户错误”。

在配置中有一个额外的 Spec 行非常隐藏。

最初,在切换到 StatefulSets 之前,我们使用的是 Deployments,我有一个 pod 规范主机名条目,它覆盖了文件顶部的 Spec

感谢@WytrzymałyWiktor and @Manjul的建议!