使用 NodeAffinity 将 Pods 分配给不同的节点池

Assigning Pods to different nodepools with NodeAffinity

我正在尝试将 pods 的集群分配给节点池,我希望这些节点池根据集群 pods 请求的资源进行更改。但是,我希望 pods 更喜欢较小的节点池(worker)并忽略较大的节点(lgworker)(因此,不要触发放大)。

        extraPodConfig:
          tolerations:
            - key: toleration_label
              value: worker
              operator: Equal
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: a_node_selector_label
                    operator: In
                    values:
                      - worker
              preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 100
                  preference:
                    matchExpressions:
                    - key: node_label
                      operator: In
                      values: 
                      - worker
                - weight: 90
                  preference:
                    matchExpressions:
                    - key: node_label
                      operator: In
                      values: 
                      - lgworker


集群 pods 默认资源请求将很容易适应较小的节点,因此我想首先使用它。只有在请求的资源超过较小节点所能容纳的资源时,才应触发较大的节点池。

我尝试对首选项进行加权,但是默认集群 pods 被安排到更大的节点池上。

有什么我遗漏的东西可以帮助我正确地将 pods 分配给较大节点上的较小节点吗?

使用适当的权重有助于选择正确的节点,但是,当请求足够多的 Dask worker 时,其中一些 worker 可能最终会出现在 lgworker 节点上。解决这个问题的方法是更新 kube-scheduler 以在考虑调度时考虑 100% 的节点。默认情况下 kube-scheduler 会考虑 N% (dynamically determined) of nodes at a time to evaluate by filtering and scoring v1.21.Kube-Scheduler.

NodeAffinity 只会到此为止,并且由于它不能保证强制执行关联,这可能会导致 pods 被安排在非首选节点上。

Node Affinity v1 :

The scheduler will prefer to schedule pods to nodes that satisfy the affinity expressions specified by this field, but it may choose a node that violates one or more of the expressions. The node that is most preferred is the one with the greatest sum of weights, i.e. for each node that meets all of the scheduling requirements (resource request, requiredDuringScheduling affinity expressions, etc.), compute a sum by iterating through the elements of this field and adding "weight" to the sum if the node matches the corresponding matchExpressions; the node(s) with the highest sum are the most preferred.

        extraPodConfig:
          tolerations:
            - key: node_toleration
              value: worker
              operator: Equal
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: node_label
                    operator: In
                    values:
                      - worker
                      - lgworker
              preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 100
                  preference:
                    matchExpressions:
                    - key: node_label
                      operator: In
                      values: 
                      - worker
                - weight: 1
                  preference:
                    matchExpressions:
                    - key: node_label
                      operator: In
                      values: 
                      - lgworker

因此,影响 kube-scheduler 将涉及更新其配置: Example

apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
algorithmSource:
  provider: DefaultProvider

...

percentageOfNodesToScore: 100