使用 CloudFormation 时 NodeGroup 未加入 EKS 集群

NodeGroup is not joining EKS cluster when using CloudFormation

我一直在关注这个 guide 通过 CloudFormation 创建一个 Kubernetes 集群,但是 NodeGroup 从未加入集群,我也从未收到错误或关于为什么不加入的解释。

我可以看到自动缩放组和 EC2 机器已创建,但 EKS 报告没有节点组。

如果我通过网络管理工具手动创建一个新的节点组,它可以工作,但它分配了不同的 security groups。它有一个 launch template 而不是 launch configuration

相同 AMI、相同 IAM 角色、相同机器类型...

我对CloudFormationEKS都很陌生,我现在不知道如何进行才能找出问题所在。

这是模板:

Description: >
    Kubernetes cluster

Parameters:

  EnvironmentName:
    Description: An environment name that will be prefixed to resource names
    Type: String

  KeyName:
    Description: The EC2 Key Pair to allow SSH access to the instances
    Type: AWS::EC2::KeyPair::KeyName

  VpcBlock:
    Type: String
    Default: 192.168.0.0/16
    Description: The CIDR range for the VPC. This should be a valid private (RFC 1918) CIDR range.

  Subnet01Block:
    Type: String
    Default: 192.168.64.0/18
    Description: CidrBlock for subnet 01 within the VPC

  Subnet02Block:
    Type: String
    Default: 192.168.128.0/18
    Description: CidrBlock for subnet 02 within the VPC

  Subnet03Block:
    Type: String
    Default: 192.168.192.0/18
    Description: CidrBlock for subnet 03 within the VPC. This is used only if the region has more than 2 AZs.

  NodeInstanceType:
    Description: EC2 instance type for the node instances
    Type: String

  NodeImageId:
    Type: AWS::EC2::Image::Id
    Description: AMI id for the node instances.

  NodeAutoScalingGroupMinSize:
    Type: Number
    Description: Minimum size of Node Group ASG.
    Default: 1

  NodeAutoScalingGroupMaxSize:
    Type: Number
    Description: Maximum size of Node Group ASG. Set to at least 1 greater than NodeAutoScalingGroupDesiredCapacity.
    Default: 3

  NodeAutoScalingGroupDesiredCapacity:
    Type: Number
    Description: Desired capacity of Node Group ASG.
    Default: 3

  BootstrapArguments:
    Description: Arguments to pass to the bootstrap script. See files/bootstrap.sh in https://github.com/awslabs/amazon-eks-ami
    Default: ""
    Type: String

Resources:

  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock:  !Ref VpcBlock
      EnableDnsSupport: true
      EnableDnsHostnames: true
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  InternetGateway:
    Type: "AWS::EC2::InternetGateway"
    Properties:
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  VPCGatewayAttachment:
    Type: "AWS::EC2::VPCGatewayAttachment"
    Properties:
      InternetGatewayId: !Ref InternetGateway
      VpcId: !Ref VPC

  RouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref VPC
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  Route:
    DependsOn: VPCGatewayAttachment
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref RouteTable
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: !Ref InternetGateway

  Subnet01:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone: !Select [ 0, !GetAZs '' ]
      CidrBlock: !Ref Subnet01Block
      VpcId: !Ref VPC
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  Subnet02:
    Type: AWS::EC2::Subnet
    Metadata:
      Comment: Subnet 02
    Properties:
      AvailabilityZone: !Select [ 1, !GetAZs '' ]
      CidrBlock: !Ref Subnet02Block
      VpcId: !Ref VPC
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  Subnet03:
    Type: AWS::EC2::Subnet
    Metadata:
      Comment: Subnet 03
    Properties:
      AvailabilityZone: !Select [ 2, !GetAZs '' ]
      CidrBlock: !Ref Subnet03Block
      VpcId: !Ref VPC
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  Subnet01RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref Subnet01
      RouteTableId: !Ref RouteTable

  Subnet02RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref Subnet02
      RouteTableId: !Ref RouteTable

  Subnet03RouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref Subnet03
      RouteTableId: !Ref RouteTable

  ControlPlaneSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Cluster communication with worker nodes
      VpcId: !Ref VPC

  ClusterRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Sub ${EnvironmentName}KubernetesClusterRole
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service: eks.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonEKSServicePolicy
        - arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  Cluster:
    Type: AWS::EKS::Cluster
    Properties:
      Name: !Sub ${EnvironmentName}KubernetesCluster
      RoleArn: !GetAtt ClusterRole.Arn
      ResourcesVpcConfig:
        SecurityGroupIds:
          - !Ref ControlPlaneSecurityGroup
        SubnetIds:
          - !Ref Subnet01
          - !Ref Subnet02
          - !Ref Subnet03

  NodeRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Sub ${EnvironmentName}KubernetesNodeRole
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service: ec2.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
        - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
        - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
        - arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess
      Path: /
      Tags:
        - Key: Environment 
          Value: !Ref EnvironmentName

  NodeInstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Path: "/"
      Roles:
      - !Ref NodeRole

  NodeSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for all nodes in the cluster
      VpcId: !Ref VPC
      Tags:
      - Key: !Sub "kubernetes.io/cluster/${EnvironmentName}KubernetesCluster"
        Value: 'owned'
      - Key: Environment 
        Value: !Ref EnvironmentName

  NodeSecurityGroupIngress:
    Type: AWS::EC2::SecurityGroupIngress
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow node to communicate with each other
      GroupId: !Ref NodeSecurityGroup
      SourceSecurityGroupId: !Ref NodeSecurityGroup
      IpProtocol: '-1'
      FromPort: 0
      ToPort: 65535

  NodeSecurityGroupFromControlPlaneIngress:
    Type: AWS::EC2::SecurityGroupIngress
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow worker Kubelets and pods to receive communication from the cluster control plane
      GroupId: !Ref NodeSecurityGroup
      SourceSecurityGroupId: !Ref ControlPlaneSecurityGroup
      IpProtocol: tcp
      FromPort: 1025
      ToPort: 65535

  ControlPlaneEgressToNodeSecurityGroup:
    Type: AWS::EC2::SecurityGroupEgress
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow the cluster control plane to communicate with worker Kubelet and pods
      GroupId: !Ref ControlPlaneSecurityGroup
      DestinationSecurityGroupId: !Ref NodeSecurityGroup
      IpProtocol: tcp
      FromPort: 1025
      ToPort: 65535

  NodeSecurityGroupFromControlPlaneOn443Ingress:
    Type: AWS::EC2::SecurityGroupIngress
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow pods running extension API servers on port 443 to receive communication from cluster control plane
      GroupId: !Ref NodeSecurityGroup
      SourceSecurityGroupId: !Ref ControlPlaneSecurityGroup
      IpProtocol: tcp
      FromPort: 443
      ToPort: 443

  ControlPlaneEgressToNodeSecurityGroupOn443:
    Type: AWS::EC2::SecurityGroupEgress
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow the cluster control plane to communicate with pods running extension API servers on port 443
      GroupId: !Ref ControlPlaneSecurityGroup
      DestinationSecurityGroupId: !Ref NodeSecurityGroup
      IpProtocol: tcp
      FromPort: 443
      ToPort: 443

  ClusterControlPlaneSecurityGroupIngress:
    Type: AWS::EC2::SecurityGroupIngress
    DependsOn: NodeSecurityGroup
    Properties:
      Description: Allow pods to communicate with the cluster API Server
      GroupId: !Ref ControlPlaneSecurityGroup
      SourceSecurityGroupId: !Ref NodeSecurityGroup
      IpProtocol: tcp
      ToPort: 443
      FromPort: 443

  NodeGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      DesiredCapacity: !Ref NodeAutoScalingGroupDesiredCapacity
      LaunchConfigurationName: !Ref NodeLaunchConfig
      MinSize: !Ref NodeAutoScalingGroupMinSize
      MaxSize: !Ref NodeAutoScalingGroupMaxSize
      VPCZoneIdentifier:
        - !Ref Subnet01
        - !Ref Subnet02
        - !Ref Subnet03
      Tags:
      - Key: Name
        Value: !Sub "${EnvironmentName}KubernetesCluster-Node"
        PropagateAtLaunch: 'true'
      - Key: !Sub 'kubernetes.io/cluster/${EnvironmentName}KubernetesCluster'
        Value: 'owned'
        PropagateAtLaunch: 'true'
    UpdatePolicy:
      AutoScalingRollingUpdate:
        MaxBatchSize: '1'
        MinInstancesInService: !Ref NodeAutoScalingGroupDesiredCapacity
        PauseTime: 'PT5M'

  NodeLaunchConfig:
    Type: AWS::AutoScaling::LaunchConfiguration
    Properties:
      AssociatePublicIpAddress: 'true'
      IamInstanceProfile: !Ref NodeInstanceProfile
      ImageId: !Ref NodeImageId
      InstanceType: !Ref NodeInstanceType
      KeyName: !Ref KeyName
      SecurityGroups:
      - !Ref NodeSecurityGroup
      BlockDeviceMappings:
        - DeviceName: /dev/xvda
          Ebs:
            VolumeSize: 20
            VolumeType: gp2
            DeleteOnTermination: true
      UserData:
        Fn::Base64:
          !Sub |
            #!/bin/bash
            set -o xtrace
            /etc/eks/bootstrap.sh ${EnvironmentName}KubernetesCluster ${BootstrapArguments}
            /opt/aws/bin/cfn-signal --exit-code $? \
                     --stack  ${AWS::StackName} \
                     --resource NodeGroup  \
                     --region ${AWS::Region}

Outputs:

    KubernetesClusterName:
      Description: Cluster name
      Value: !Ref Cluster
      Export:
        Name: KubernetesClusterName

    KubernetesClusterEndpoint:
      Description: Cluster endpoint
      Value: !GetAtt Cluster.Endpoint
      Export:
        Name: KubernetesClusterEndpoint

    KubernetesNodeInstanceProfile:
      Description: The name of the IAM profile for K8
      Value: !GetAtt NodeInstanceProfile.Arn
      Export:
        Name: KubernetesNodeInstanceProfileArn

有两种方法可以将工作节点添加到您的 EKS 集群:

  1. 自行启动和注册工作器 (https://docs.aws.amazon.com/eks/latest/userguide/launch-workers.html)
  2. 使用托管节点组(https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html

正如我从您的模板中看到的,您现在使用的是第一种方法。执行此操作时重要的是,您需要等到 EKS 集群准备就绪并处于活动状态,然后再启动工作程序节点。您可以使用 DependsOn Attribute 来实现此目的。如果这不能解决您的问题,请查看云初始化日志 (/var/log/cloud-init-output.log) 以检查加入集群时发生了什么。

如果您想使用托管节点组,只需删除 AutoScaling 组和 LaunchConfiguration 并改用此类型:https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-eks-nodegroup.html 好处是,AWS 负责在您的账户中为您创建所需的资源(AutoScaling 组和 LaunchTemplate),您可以在 AWS 控制台中看到节点组。

我使用了托管节点组 (https://docs.aws.amazon.com/eks/latest/userguide/managed-node-groups.html) 选项。它正在工作。但是如何定义自动缩放策略。它只允许给出最大和最小节点数而不是名称。