无法从堡垒访问通过 Terraform 设置 AWS EKS 集群

AWS EKS cluster setup via Terraform inaccessible from bastion

背景和背景

我正在开发一个 Terraform 项目,该项目的最终目标是具有以下属性的 EKS 集群:

  1. 外部互联网专用
  2. 可通过堡垒主机访问
  3. 使用工作组
  4. 可通过 Terraform Kubernetes 模块配置的资源(部署、cron 作业等)

为此,我稍微修改了 Terraform EKS 示例(问题底部的代码)。我遇到的问题是,在 SSH-ing 进入堡垒后,我无法对集群执行 ping 操作,任何类似 kubectl get pods 的命令在大约 60 秒后超时。

这里是 facts/things 我知道是真的:

  1. 出于测试目的,我已经(暂时)将集群切换到 public 集群。以前,当我将 cluster_endpoint_public_access 设置为 false 时,terraform apply 命令甚至无法完成,因为它无法访问集群上的 /healthz 端点。
  2. Bastion 配置的工作原理是用户数据 运行 成功并安装 kubectl 和 kubeconfig 文件
  3. 我可以通过静态 IP 通过 SSH 连接到堡垒(代码中的 var.company_vpn_ips
  4. 这完全有可能是一个网络问题,而不是 EKS/Terraform 问题,因为我对 VPC 及其安全组如何融入这幅图的理解并不完全成熟。

代码

这是 VPC 配置:

locals {
  vpc_name            = "my-vpc"
  vpc_cidr            = "10.0.0.0/16"
  public_subnet_cidr  = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
  private_subnet_cidr = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
}

# The definition of the VPC to create

module "vpc" {

  source  = "terraform-aws-modules/vpc/aws"
  version = "3.2.0"

  name                 = local.vpc_name
  cidr                 = local.vpc_cidr
  azs                  = data.aws_availability_zones.available.names
  private_subnets      = local.private_subnet_cidr
  public_subnets       = local.public_subnet_cidr
  enable_nat_gateway   = true
  single_nat_gateway   = true
  enable_dns_hostnames = true

  tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
  }

  public_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    "kubernetes.io/role/elb"                    = "1"
  }

  private_subnet_tags = {
    "kubernetes.io/cluster/${var.cluster_name}" = "shared"
    "kubernetes.io/role/internal-elb"           = "1"
  }
}

data "aws_availability_zones" "available" {}

然后我为集群创建的安全组:

resource "aws_security_group" "ssh_sg" {
  name_prefix = "ssh-sg"
  vpc_id      = module.vpc.vpc_id

  ingress {
    from_port = 22
    to_port   = 22
    protocol  = "tcp"

    cidr_blocks = [
      "10.0.0.0/8",
    ]
  }
}

resource "aws_security_group" "all_worker_mgmt" {
  name_prefix = "all_worker_management"
  vpc_id      = module.vpc.vpc_id

  ingress {
    from_port = 22
    to_port   = 22
    protocol  = "tcp"

    cidr_blocks = [
      "10.0.0.0/8",
      "172.16.0.0/12",
      "192.168.0.0/16",
    ]
  }
}

这是集群配置:

locals {
  cluster_version = "1.21"
}

# Create the EKS resource that will setup the EKS cluster
module "eks_cluster" {
  source = "terraform-aws-modules/eks/aws"

  # The name of the cluster to create
  cluster_name = var.cluster_name

  # Disable public access to the cluster API endpoint
  cluster_endpoint_public_access = true

  # Enable private access to the cluster API endpoint
  cluster_endpoint_private_access = true

  # The version of the cluster to create
  cluster_version = local.cluster_version

  # The VPC ID to create the cluster in
  vpc_id = var.vpc_id

  # The subnets to add the cluster to
  subnets = var.private_subnets

  # Default information on the workers
  workers_group_defaults = {
    root_volume_type = "gp2"
  }

  worker_additional_security_group_ids = [var.all_worker_mgmt_id]

  # Specify the worker groups
  worker_groups = [
    {
      # The name of this worker group
      name = "default-workers"
      # The instance type for this worker group
      instance_type = var.eks_worker_instance_type
      # The number of instances to raise up
      asg_desired_capacity = var.eks_num_workers
      asg_max_size         = var.eks_num_workers
      asg_min_size         = var.eks_num_workers
      # The security group IDs for these instances
      additional_security_group_ids = [var.ssh_sg_id]
    }
  ]
}

data "aws_eks_cluster" "cluster" {
  name = module.eks_cluster.cluster_id
}

data "aws_eks_cluster_auth" "cluster" {
  name = module.eks_cluster.cluster_id
}

output "worker_iam_role_name" {
  value = module.eks_cluster.worker_iam_role_name
}

最后是堡垒:

locals {
  ami           = "ami-0f19d220602031aed" # Amazon Linux 2 AMI (us-east-2)
  instance_type = "t3.small"
  key_name      = "bastion-kp"
}

resource "aws_iam_instance_profile" "bastion" {
  name = "bastion"
  role = var.role_name
}

resource "aws_instance" "bastion" {
  ami           = local.ami
  instance_type = local.instance_type

  key_name                    = local.key_name
  associate_public_ip_address = true
  subnet_id                   = var.public_subnet
  iam_instance_profile        = aws_iam_instance_profile.bastion.name

  security_groups = [aws_security_group.bastion-sg.id]

  tags = {
    Name = "K8s Bastion"
  }

  lifecycle {
    ignore_changes = all
  }

  user_data = <<EOF
      #! /bin/bash

      # Install Kubectl
      curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
      install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
      kubectl version --client

      # Install Helm
      curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
      chmod 700 get_helm.sh
      ./get_helm.sh
      helm version

      # Install AWS
      curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
      unzip awscliv2.zip
      ./aws/install
      aws --version

      # Install aws-iam-authenticator
      curl -o aws-iam-authenticator https://amazon-eks.s3.us-west-2.amazonaws.com/1.21.2/2021-07-05/bin/linux/amd64/aws-iam-authenticator
      chmod +x ./aws-iam-authenticator
      mkdir -p $HOME/bin && cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export PATH=$PATH:$HOME/bin
      echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
      aws-iam-authenticator help

      # Add the kube config file 
      mkdir ~/.kube
      echo "${var.kubectl_config}" >> ~/.kube/config
  EOF
}

resource "aws_security_group" "bastion-sg" {
  name   = "bastion-sg"
  vpc_id = var.vpc_id
}

resource "aws_security_group_rule" "sg-rule-ssh" {
  security_group_id = aws_security_group.bastion-sg.id
  from_port         = 22
  protocol          = "tcp"
  to_port           = 22
  type              = "ingress"
  cidr_blocks       = var.company_vpn_ips
  depends_on        = [aws_security_group.bastion-sg]
}

resource "aws_security_group_rule" "sg-rule-egress" {
  security_group_id = aws_security_group.bastion-sg.id
  type              = "egress"
  from_port         = 0
  protocol          = "all"
  to_port           = 0
  cidr_blocks       = ["0.0.0.0/0"]
  ipv6_cidr_blocks  = ["::/0"]
  depends_on        = [aws_security_group.bastion-sg]
}

询问

对我来说最紧迫的问题是找到一种通过堡垒与集群交互的方法,以便 Terraform 代码的另一部分可以 运行(在集群本身中启动的资源)。我也希望了解当 terraform apply 命令无法访问私有集群时如何设置它。预先感谢您提供的任何帮助!

查看您的节点组如何与控制平面通信,您需要将相同的集群安全组添加到您的堡垒主机,以便它与控制平面通信。您可以在 EKS 控制台 - 网络选项卡上找到 SG id。

我认为您需要在配置映射 aws-auth 的 terraform 代码中部署 Kubernetes 资源,这使您的堡垒主机有权访问此集群,因为它只能由其用户创建者访问 看看这个 link -> https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html