如何在 EKS 受管节点上限制 pods 的数量

How to restrict the number of pods on an EKS managed node

我正在使用 Cloudformation Script 管理 AWS 上的基础设施。创建具有托管节点组的 EKS 集群,实例类型为 m5a.16xlarge(64 核,256gb 内存)。现在实例可以拥有的最大 pods 数量是 737,这是基于最大 ENI 和子网中的 Ip 地址数量。我需要限制我的托管节点组上 pods 的数量,以使其不超过特定的自定义计数。 我应该怎么做才能做到这一点?

我终于找到了这个问题的答案。我们可以通过对工作节点使用自定义 AMI 来限制特定 eks 集群上 pods 的数量。

这是用于创建自定义 AMI 的 link:

https://aws.amazon.com/premiumsupport/knowledge-center/eks-custom-linux-ami/

此 link 提供了一种方法,可以将存储库克隆到您的本地计算机,对其进行自定义,然后在您的 aws 帐户中创建 ami。本地计算机或正在克隆 repo 的任何 ec2 实例必须配置最新版本的 aws cli。

克隆回购后,我们可以编辑 /sampleDir/files/eni-max-pods.txt 文件的配置,该文件由所有受支持的实例及其最大 pods 值组成。 /sampleDir/ 是我们克隆存储库的目录。

在此之后我们需要 运行 'make' 命令,它会在任何给定区域自动创建自定义 ami。然后我们可以使用 cloudformation 模板在 eks 上启动托管节点或自托管节点。

这是一个示例 cloudformation 模板。

注意:我们可以通过使用以下类型编辑节点组部分来为托管节点组使用相同的 CF 模板:Aws::EKS::Nodegroup


Description: Amazon EKS - Node Group

Metadata:
  "AWS::CloudFormation::Interface":
    ParameterGroups:
      - Label:
          default: EKS Cluster
        Parameters:
          - ClusterName
          - ClusterControlPlaneSecurityGroup
      - Label:
          default: Worker Node Configuration
        Parameters:
          - NodeGroupName
          - NodeAutoScalingGroupMinSize
          - NodeAutoScalingGroupDesiredCapacity
          - NodeAutoScalingGroupMaxSize
          - NodeInstanceType
          - NodeImageIdSSMParam
          - NodeImageId
          - NodeVolumeSize
          - KeyName
          - BootstrapArguments
          - DisableIMDSv1
      - Label:
          default: Worker Network Configuration
        Parameters:
          - VpcId
          - Subnets

Parameters:
  BootstrapArguments:
    Type: String
    Default: ""
    Description: "Arguments to pass to the bootstrap script. See files/bootstrap.sh in https://github.com/awslabs/amazon-eks-ami"

  ClusterControlPlaneSecurityGroup:
    Type: "AWS::EC2::SecurityGroup::Id"
    Description: The security group of the cluster control plane.

  ClusterName:
    Type: String
    Description: The cluster name provided when the cluster was created. If it is incorrect, nodes will not be able to join the cluster.

  KeyName:
    Type: "AWS::EC2::KeyPair::KeyName"
    Description: The EC2 Key Pair to allow SSH access to the instances

  NodeAutoScalingGroupDesiredCapacity:
    Type: Number
    Default: 3
    Description: Desired capacity of Node Group ASG.

  NodeAutoScalingGroupMaxSize:
    Type: Number
    Default: 4
    Description: Maximum size of Node Group ASG. Set to at least 1 greater than NodeAutoScalingGroupDesiredCapacity.

  NodeAutoScalingGroupMinSize:
    Type: Number
    Default: 1
    Description: Minimum size of Node Group ASG.

  NodeGroupName:
    Type: String
    Description: Unique identifier for the Node Group.

  NodeImageId:
    Type: String
    Default: ""
    Description: (Optional) Specify your own custom image ID. This value overrides any AWS Systems Manager Parameter Store value specified above.

  NodeImageIdSSMParam:
    Type: "AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>"
    Default: /aws/service/eks/optimized-ami/1.17/amazon-linux-2/recommended/image_id
    Description: AWS Systems Manager Parameter Store parameter of the AMI ID for the worker node instances. Change this value to match the version of Kubernetes you are using.

  DisableIMDSv1:
    Type: String
    Default: "false"
    AllowedValues:
      - "false"
      - "true"

  NodeInstanceType:
    Type: String
    Default: t3.medium
    AllowedValues:
      - a1.2xlarge
      - a1.4xlarge
      - a1.large
      - a1.medium
      - a1.metal
      - a1.xlarge
      - c1.medium
      - c1.xlarge
      - c3.2xlarge
      - c3.4xlarge
      - c3.8xlarge
      - c3.large
      - c3.xlarge
      - c4.2xlarge
      - c4.4xlarge
      - c4.8xlarge
      - c4.large
      - c4.xlarge
      - c5.12xlarge
      - c5.18xlarge
      - c5.24xlarge
      - c5.2xlarge
      - c5.4xlarge
      - c5.9xlarge
      - c5.large
      - c5.metal
      - c5.xlarge
      - c5a.12xlarge
      - c5a.16xlarge
      - c5a.24xlarge
      - c5a.2xlarge
      - c5a.4xlarge
      - c5a.8xlarge
      - c5a.large
      - c5a.metal
      - c5a.xlarge
      - c5ad.12xlarge
      - c5ad.16xlarge
      - c5ad.24xlarge
      - c5ad.2xlarge
      - c5ad.4xlarge
      - c5ad.8xlarge
      - c5ad.large
      - c5ad.metal
      - c5ad.xlarge
      - c5d.12xlarge
      - c5d.18xlarge
      - c5d.24xlarge
      - c5d.2xlarge
      - c5d.4xlarge
      - c5d.9xlarge
      - c5d.large
      - c5d.metal
      - c5d.xlarge
      - c5n.18xlarge
      - c5n.2xlarge
      - c5n.4xlarge
      - c5n.9xlarge
      - c5n.large
      - c5n.metal
      - c5n.xlarge
      - c6g.12xlarge
      - c6g.16xlarge
      - c6g.2xlarge
      - c6g.4xlarge
      - c6g.8xlarge
      - c6g.large
      - c6g.medium
      - c6g.metal
      - c6g.xlarge
      - c6gd.12xlarge
      - c6gd.16xlarge
      - c6gd.2xlarge
      - c6gd.4xlarge
      - c6gd.8xlarge
      - c6gd.large
      - c6gd.medium
      - c6gd.metal
      - c6gd.xlarge
      - c6gn.12xlarge
      - c6gn.16xlarge
      - c6gn.2xlarge
      - c6gn.4xlarge
      - c6gn.8xlarge
      - c6gn.large
      - c6gn.medium
      - c6gn.xlarge
      - cc2.8xlarge
      - cr1.8xlarge
      - d2.2xlarge
      - d2.4xlarge
      - d2.8xlarge
      - d2.xlarge
      - d3.2xlarge
      - d3.4xlarge
      - d3.8xlarge
      - d3.xlarge
      - d3en.12xlarge
      - d3en.2xlarge
      - d3en.4xlarge
      - d3en.6xlarge
      - d3en.8xlarge
      - d3en.xlarge
      - f1.16xlarge
      - f1.2xlarge
      - f1.4xlarge
      - g2.2xlarge
      - g2.8xlarge
      - g3.16xlarge
      - g3.4xlarge
      - g3.8xlarge
      - g3s.xlarge
      - g4ad.16xlarge
      - g4ad.4xlarge
      - g4ad.8xlarge
      - g4dn.12xlarge
      - g4dn.16xlarge
      - g4dn.2xlarge
      - g4dn.4xlarge
      - g4dn.8xlarge
      - g4dn.metal
      - g4dn.xlarge
      - h1.16xlarge
      - h1.2xlarge
      - h1.4xlarge
      - h1.8xlarge
      - hs1.8xlarge
      - i2.2xlarge
      - i2.4xlarge
      - i2.8xlarge
      - i2.xlarge
      - i3.16xlarge
      - i3.2xlarge
      - i3.4xlarge
      - i3.8xlarge
      - i3.large
      - i3.metal
      - i3.xlarge
      - i3en.12xlarge
      - i3en.24xlarge
      - i3en.2xlarge
      - i3en.3xlarge
      - i3en.6xlarge
      - i3en.large
      - i3en.metal
      - i3en.xlarge
      - inf1.24xlarge
      - inf1.2xlarge
      - inf1.6xlarge
      - inf1.xlarge
      - m1.large
      - m1.medium
      - m1.small
      - m1.xlarge
      - m2.2xlarge
      - m2.4xlarge
      - m2.xlarge
      - m3.2xlarge
      - m3.large
      - m3.medium
      - m3.xlarge
      - m4.10xlarge
      - m4.16xlarge
      - m4.2xlarge
      - m4.4xlarge
      - m4.large
      - m4.xlarge
      - m5.12xlarge
      - m5.16xlarge
      - m5.24xlarge
      - m5.2xlarge
      - m5.4xlarge
      - m5.8xlarge
      - m5.large
      - m5.metal
      - m5.xlarge
      - m5a.12xlarge
      - m5a.16xlarge
      - m5a.24xlarge
      - m5a.2xlarge
      - m5a.4xlarge
      - m5a.8xlarge
      - m5a.large
      - m5a.xlarge
      - m5ad.12xlarge
      - m5ad.16xlarge
      - m5ad.24xlarge
      - m5ad.2xlarge
      - m5ad.4xlarge
      - m5ad.8xlarge
      - m5ad.large
      - m5ad.xlarge
      - m5d.12xlarge
      - m5d.16xlarge
      - m5d.24xlarge
      - m5d.2xlarge
      - m5d.4xlarge
      - m5d.8xlarge
      - m5d.large
      - m5d.metal
      - m5d.xlarge
      - m5dn.12xlarge
      - m5dn.16xlarge
      - m5dn.24xlarge
      - m5dn.2xlarge
      - m5dn.4xlarge
      - m5dn.8xlarge
      - m5dn.large
      - m5dn.xlarge
      - m5n.12xlarge
      - m5n.16xlarge
      - m5n.24xlarge
      - m5n.2xlarge
      - m5n.4xlarge
      - m5n.8xlarge
      - m5n.large
      - m5n.xlarge
      - m5zn.12xlarge
      - m5zn.2xlarge
      - m5zn.3xlarge
      - m5zn.6xlarge
      - m5zn.large
      - m5zn.metal
      - m5zn.xlarge
      - m6g.12xlarge
      - m6g.16xlarge
      - m6g.2xlarge
      - m6g.4xlarge
      - m6g.8xlarge
      - m6g.large
      - m6g.medium
      - m6g.metal
      - m6g.xlarge
      - m6gd.12xlarge
      - m6gd.16xlarge
      - m6gd.2xlarge
      - m6gd.4xlarge
      - m6gd.8xlarge
      - m6gd.large
      - m6gd.medium
      - m6gd.metal
      - m6gd.xlarge
      - mac1.metal
      - p2.16xlarge
      - p2.8xlarge
      - p2.xlarge
      - p3.16xlarge
      - p3.2xlarge
      - p3.8xlarge
      - p3dn.24xlarge
      - p4d.24xlarge
      - r3.2xlarge
      - r3.4xlarge
      - r3.8xlarge
      - r3.large
      - r3.xlarge
      - r4.16xlarge
      - r4.2xlarge
      - r4.4xlarge
      - r4.8xlarge
      - r4.large
      - r4.xlarge
      - r5.12xlarge
      - r5.16xlarge
      - r5.24xlarge
      - r5.2xlarge
      - r5.4xlarge
      - r5.8xlarge
      - r5.large
      - r5.metal
      - r5.xlarge
      - r5a.12xlarge
      - r5a.16xlarge
      - r5a.24xlarge
      - r5a.2xlarge
      - r5a.4xlarge
      - r5a.8xlarge
      - r5a.large
      - r5a.xlarge
      - r5ad.12xlarge
      - r5ad.16xlarge
      - r5ad.24xlarge
      - r5ad.2xlarge
      - r5ad.4xlarge
      - r5ad.8xlarge
      - r5ad.large
      - r5ad.xlarge
      - r5b.12xlarge
      - r5b.16xlarge
      - r5b.24xlarge
      - r5b.2xlarge
      - r5b.4xlarge
      - r5b.8xlarge
      - r5b.large
      - r5b.metal
      - r5b.xlarge
      - r5d.12xlarge
      - r5d.16xlarge
      - r5d.24xlarge
      - r5d.2xlarge
      - r5d.4xlarge
      - r5d.8xlarge
      - r5d.large
      - r5d.metal
      - r5d.xlarge
      - r5dn.12xlarge
      - r5dn.16xlarge
      - r5dn.24xlarge
      - r5dn.2xlarge
      - r5dn.4xlarge
      - r5dn.8xlarge
      - r5dn.large
      - r5dn.xlarge
      - r5n.12xlarge
      - r5n.16xlarge
      - r5n.24xlarge
      - r5n.2xlarge
      - r5n.4xlarge
      - r5n.8xlarge
      - r5n.large
      - r5n.xlarge
      - r6g.12xlarge
      - r6g.16xlarge
      - r6g.2xlarge
      - r6g.4xlarge
      - r6g.8xlarge
      - r6g.large
      - r6g.medium
      - r6g.metal
      - r6g.xlarge
      - r6gd.12xlarge
      - r6gd.16xlarge
      - r6gd.2xlarge
      - r6gd.4xlarge
      - r6gd.8xlarge
      - r6gd.large
      - r6gd.medium
      - r6gd.metal
      - r6gd.xlarge
      - t1.micro
      - t2.2xlarge
      - t2.large
      - t2.medium
      - t2.micro
      - t2.nano
      - t2.small
      - t2.xlarge
      - t3.2xlarge
      - t3.large
      - t3.medium
      - t3.micro
      - t3.nano
      - t3.small
      - t3.xlarge
      - t3a.2xlarge
      - t3a.large
      - t3a.medium
      - t3a.micro
      - t3a.nano
      - t3a.small
      - t3a.xlarge
      - t4g.2xlarge
      - t4g.large
      - t4g.medium
      - t4g.micro
      - t4g.nano
      - t4g.small
      - t4g.xlarge
      - u-12tb1.metal
      - u-18tb1.metal
      - u-24tb1.metal
      - u-6tb1.metal
      - u-9tb1.metal
      - x1.16xlarge
      - x1.32xlarge
      - x1e.16xlarge
      - x1e.2xlarge
      - x1e.32xlarge
      - x1e.4xlarge
      - x1e.8xlarge
      - x1e.xlarge
      - z1d.12xlarge
      - z1d.2xlarge
      - z1d.3xlarge
      - z1d.6xlarge
      - z1d.large
      - z1d.metal
      - z1d.xlarge
    ConstraintDescription: Must be a valid EC2 instance type
    Description: EC2 instance type for the node instances

  NodeVolumeSize:
    Type: Number
    Default: 20
    Description: Node volume size

  Subnets:
    Type: "List<AWS::EC2::Subnet::Id>"
    Description: The subnets where workers can be created.

  VpcId:
    Type: "AWS::EC2::VPC::Id"
    Description: The VPC of the worker instances

Mappings:
  PartitionMap:
    aws:
      EC2ServicePrincipal: "ec2.amazonaws.com"
    aws-us-gov:
      EC2ServicePrincipal: "ec2.amazonaws.com"
    aws-cn:
      EC2ServicePrincipal: "ec2.amazonaws.com.cn"
    aws-iso:
      EC2ServicePrincipal: "ec2.c2s.ic.gov"
    aws-iso-b:
      EC2ServicePrincipal: "ec2.sc2s.sgov.gov"

Conditions:
  HasNodeImageId: !Not
    - "Fn::Equals":
      - !Ref NodeImageId
      - ""

  IMDSv1Disabled:
    "Fn::Equals":
      - !Ref DisableIMDSv1
      - "true"

Resources:
  NodeInstanceRole:
    Type: "AWS::IAM::Role"
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - !FindInMap [PartitionMap, !Ref "AWS::Partition", EC2ServicePrincipal]
            Action:
              - "sts:AssumeRole"
      ManagedPolicyArns:
        - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKSWorkerNodePolicy"
        - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEKS_CNI_Policy"
        - !Sub "arn:${AWS::Partition}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
      Path: /

  NodeInstanceProfile:
    Type: "AWS::IAM::InstanceProfile"
    Properties:
      Path: /
      Roles:
        - !Ref NodeInstanceRole

  NodeSecurityGroup:
    Type: "AWS::EC2::SecurityGroup"
    Properties:
      GroupDescription: Security group for all nodes in the cluster
      Tags:
        - Key: !Sub kubernetes.io/cluster/${ClusterName}
          Value: owned
      VpcId: !Ref VpcId

  NodeSecurityGroupIngress:
    Type: "AWS::EC2::SecurityGroupIngress"
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow node to communicate with each other
      FromPort: 0
      GroupId: !Ref NodeSecurityGroup
      IpProtocol: "-1"
      SourceSecurityGroupId: !Ref NodeSecurityGroup
      ToPort: 65535

  ClusterControlPlaneSecurityGroupIngress:
    Type: "AWS::EC2::SecurityGroupIngress"
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow pods to communicate with the cluster API Server
      FromPort: 443
      GroupId: !Ref ClusterControlPlaneSecurityGroup
      IpProtocol: tcp
      SourceSecurityGroupId: !Ref NodeSecurityGroup
      ToPort: 443

  ControlPlaneEgressToNodeSecurityGroup:
    Type: "AWS::EC2::SecurityGroupEgress"
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow the cluster control plane to communicate with worker Kubelet and pods
      DestinationSecurityGroupId: !Ref NodeSecurityGroup
      FromPort: 1025
      GroupId: !Ref ClusterControlPlaneSecurityGroup
      IpProtocol: tcp
      ToPort: 65535

  ControlPlaneEgressToNodeSecurityGroupOn443:
    Type: "AWS::EC2::SecurityGroupEgress"
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow the cluster control plane to communicate with pods running extension API servers on port 443
      DestinationSecurityGroupId: !Ref NodeSecurityGroup
      FromPort: 443
      GroupId: !Ref ClusterControlPlaneSecurityGroup
      IpProtocol: tcp
      ToPort: 443

  NodeSecurityGroupFromControlPlaneIngress:
    Type: "AWS::EC2::SecurityGroupIngress"
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow worker Kubelets and pods to receive communication from the cluster control plane
      FromPort: 1025
      GroupId: !Ref NodeSecurityGroup
      IpProtocol: tcp
      SourceSecurityGroupId: !Ref ClusterControlPlaneSecurityGroup
      ToPort: 65535

  NodeSecurityGroupFromControlPlaneOn443Ingress:
    Type: "AWS::EC2::SecurityGroupIngress"
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow pods running extension API servers on port 443 to receive communication from cluster control plane
      FromPort: 443
      GroupId: !Ref NodeSecurityGroup
      IpProtocol: tcp
      SourceSecurityGroupId: !Ref ClusterControlPlaneSecurityGroup
      ToPort: 443

  NodeLaunchTemplate:
    Type: "AWS::EC2::LaunchTemplate"
    Properties:
      LaunchTemplateData:
        BlockDeviceMappings:
          - DeviceName: /dev/xvda
            Ebs:
              DeleteOnTermination: true
              VolumeSize: !Ref NodeVolumeSize
              VolumeType: gp2
        IamInstanceProfile:
          Arn: !GetAtt NodeInstanceProfile.Arn
        ImageId: !If
          - HasNodeImageId
          - !Ref NodeImageId
          - !Ref NodeImageIdSSMParam
        InstanceType: !Ref NodeInstanceType
        KeyName: !Ref KeyName
        SecurityGroupIds:
        - !Ref NodeSecurityGroup
        UserData: !Base64
          "Fn::Sub": |
            #!/bin/bash
            set -o xtrace
            /etc/eks/bootstrap.sh ${ClusterName} ${BootstrapArguments},
            
            /opt/aws/bin/cfn-signal --exit-code $? \
                     --stack  ${AWS::StackName} \
                     --resource NodeGroup  \
                     --region ${AWS::Region}
        MetadataOptions:
          HttpPutResponseHopLimit : 2
          HttpEndpoint: enabled
          HttpTokens: !If
            - IMDSv1Disabled
            - required
            - optional

  NodeGroup:
    Type: "AWS::AutoScaling::AutoScalingGroup"
    Properties:
      DesiredCapacity: !Ref NodeAutoScalingGroupDesiredCapacity
      LaunchTemplate:
        LaunchTemplateId: !Ref NodeLaunchTemplate
        Version: !GetAtt NodeLaunchTemplate.LatestVersionNumber
      MaxSize: !Ref NodeAutoScalingGroupMaxSize
      MinSize: !Ref NodeAutoScalingGroupMinSize
      Tags:
        - Key: Name
          PropagateAtLaunch: true
          Value: !Sub ${ClusterName}-${NodeGroupName}-Node
        - Key: !Sub kubernetes.io/cluster/${ClusterName}
          PropagateAtLaunch: true
          Value: owned
      VPCZoneIdentifier: !Ref Subnets
    UpdatePolicy:
      AutoScalingRollingUpdate:
        MaxBatchSize: 1
        MinInstancesInService: !Ref NodeAutoScalingGroupDesiredCapacity
        PauseTime: PT5M

Outputs:
  NodeInstanceRole:
    Description: The node instance role
    Value: !GetAtt NodeInstanceRole.Arn

  NodeSecurityGroup:
    Description: The security group for the node group
    Value: !Ref NodeSecurityGroup

  NodeAutoScalingGroup:
    Description: The autoscaling group
    Value: !Ref NodeGroup