无法从堡垒访问通过 Terraform 设置 AWS EKS 集群
AWS EKS cluster setup via Terraform inaccessible from bastion
背景和背景
我正在开发一个 Terraform 项目,该项目的最终目标是具有以下属性的 EKS 集群:
- 外部互联网专用
- 可通过堡垒主机访问
- 使用工作组
- 可通过 Terraform Kubernetes 模块配置的资源(部署、cron 作业等)
为此,我稍微修改了 Terraform EKS 示例(问题底部的代码)。我遇到的问题是,在 SSH-ing 进入堡垒后,我无法对集群执行 ping 操作,任何类似 kubectl get pods
的命令在大约 60 秒后超时。
这里是 facts/things 我知道是真的:
- 出于测试目的,我已经(暂时)将集群切换到 public 集群。以前,当我将
cluster_endpoint_public_access
设置为 false
时,terraform apply
命令甚至无法完成,因为它无法访问集群上的 /healthz
端点。
- Bastion 配置的工作原理是用户数据 运行 成功并安装
kubectl
和 kubeconfig 文件
- 我可以通过静态 IP 通过 SSH 连接到堡垒(代码中的
var.company_vpn_ips
)
- 这完全有可能是一个网络问题,而不是 EKS/Terraform 问题,因为我对 VPC 及其安全组如何融入这幅图的理解并不完全成熟。
代码
这是 VPC 配置:
locals {
vpc_name = "my-vpc"
vpc_cidr = "10.0.0.0/16"
public_subnet_cidr = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
private_subnet_cidr = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
}
# The definition of the VPC to create
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "3.2.0"
name = local.vpc_name
cidr = local.vpc_cidr
azs = data.aws_availability_zones.available.names
private_subnets = local.private_subnet_cidr
public_subnets = local.public_subnet_cidr
enable_nat_gateway = true
single_nat_gateway = true
enable_dns_hostnames = true
tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
public_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = "1"
}
}
data "aws_availability_zones" "available" {}
然后我为集群创建的安全组:
resource "aws_security_group" "ssh_sg" {
name_prefix = "ssh-sg"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [
"10.0.0.0/8",
]
}
}
resource "aws_security_group" "all_worker_mgmt" {
name_prefix = "all_worker_management"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [
"10.0.0.0/8",
"172.16.0.0/12",
"192.168.0.0/16",
]
}
}
这是集群配置:
locals {
cluster_version = "1.21"
}
# Create the EKS resource that will setup the EKS cluster
module "eks_cluster" {
source = "terraform-aws-modules/eks/aws"
# The name of the cluster to create
cluster_name = var.cluster_name
# Disable public access to the cluster API endpoint
cluster_endpoint_public_access = true
# Enable private access to the cluster API endpoint
cluster_endpoint_private_access = true
# The version of the cluster to create
cluster_version = local.cluster_version
# The VPC ID to create the cluster in
vpc_id = var.vpc_id
# The subnets to add the cluster to
subnets = var.private_subnets
# Default information on the workers
workers_group_defaults = {
root_volume_type = "gp2"
}
worker_additional_security_group_ids = [var.all_worker_mgmt_id]
# Specify the worker groups
worker_groups = [
{
# The name of this worker group
name = "default-workers"
# The instance type for this worker group
instance_type = var.eks_worker_instance_type
# The number of instances to raise up
asg_desired_capacity = var.eks_num_workers
asg_max_size = var.eks_num_workers
asg_min_size = var.eks_num_workers
# The security group IDs for these instances
additional_security_group_ids = [var.ssh_sg_id]
}
]
}
data "aws_eks_cluster" "cluster" {
name = module.eks_cluster.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks_cluster.cluster_id
}
output "worker_iam_role_name" {
value = module.eks_cluster.worker_iam_role_name
}
最后是堡垒:
locals {
ami = "ami-0f19d220602031aed" # Amazon Linux 2 AMI (us-east-2)
instance_type = "t3.small"
key_name = "bastion-kp"
}
resource "aws_iam_instance_profile" "bastion" {
name = "bastion"
role = var.role_name
}
resource "aws_instance" "bastion" {
ami = local.ami
instance_type = local.instance_type
key_name = local.key_name
associate_public_ip_address = true
subnet_id = var.public_subnet
iam_instance_profile = aws_iam_instance_profile.bastion.name
security_groups = [aws_security_group.bastion-sg.id]
tags = {
Name = "K8s Bastion"
}
lifecycle {
ignore_changes = all
}
user_data = <<EOF
#! /bin/bash
# Install Kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client
# Install Helm
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
helm version
# Install AWS
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install
aws --version
# Install aws-iam-authenticator
curl -o aws-iam-authenticator https://amazon-eks.s3.us-west-2.amazonaws.com/1.21.2/2021-07-05/bin/linux/amd64/aws-iam-authenticator
chmod +x ./aws-iam-authenticator
mkdir -p $HOME/bin && cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export PATH=$PATH:$HOME/bin
echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
aws-iam-authenticator help
# Add the kube config file
mkdir ~/.kube
echo "${var.kubectl_config}" >> ~/.kube/config
EOF
}
resource "aws_security_group" "bastion-sg" {
name = "bastion-sg"
vpc_id = var.vpc_id
}
resource "aws_security_group_rule" "sg-rule-ssh" {
security_group_id = aws_security_group.bastion-sg.id
from_port = 22
protocol = "tcp"
to_port = 22
type = "ingress"
cidr_blocks = var.company_vpn_ips
depends_on = [aws_security_group.bastion-sg]
}
resource "aws_security_group_rule" "sg-rule-egress" {
security_group_id = aws_security_group.bastion-sg.id
type = "egress"
from_port = 0
protocol = "all"
to_port = 0
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
depends_on = [aws_security_group.bastion-sg]
}
询问
对我来说最紧迫的问题是找到一种通过堡垒与集群交互的方法,以便 Terraform 代码的另一部分可以 运行(在集群本身中启动的资源)。我也希望了解当 terraform apply
命令无法访问私有集群时如何设置它。预先感谢您提供的任何帮助!
查看您的节点组如何与控制平面通信,您需要将相同的集群安全组添加到您的堡垒主机,以便它与控制平面通信。您可以在 EKS 控制台 - 网络选项卡上找到 SG id。
我认为您需要在配置映射 aws-auth 的 terraform 代码中部署 Kubernetes 资源,这使您的堡垒主机有权访问此集群,因为它只能由其用户创建者访问
看看这个 link -> https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html
背景和背景
我正在开发一个 Terraform 项目,该项目的最终目标是具有以下属性的 EKS 集群:
- 外部互联网专用
- 可通过堡垒主机访问
- 使用工作组
- 可通过 Terraform Kubernetes 模块配置的资源(部署、cron 作业等)
为此,我稍微修改了 Terraform EKS 示例(问题底部的代码)。我遇到的问题是,在 SSH-ing 进入堡垒后,我无法对集群执行 ping 操作,任何类似 kubectl get pods
的命令在大约 60 秒后超时。
这里是 facts/things 我知道是真的:
- 出于测试目的,我已经(暂时)将集群切换到 public 集群。以前,当我将
cluster_endpoint_public_access
设置为false
时,terraform apply
命令甚至无法完成,因为它无法访问集群上的/healthz
端点。 - Bastion 配置的工作原理是用户数据 运行 成功并安装
kubectl
和 kubeconfig 文件 - 我可以通过静态 IP 通过 SSH 连接到堡垒(代码中的
var.company_vpn_ips
) - 这完全有可能是一个网络问题,而不是 EKS/Terraform 问题,因为我对 VPC 及其安全组如何融入这幅图的理解并不完全成熟。
代码
这是 VPC 配置:
locals {
vpc_name = "my-vpc"
vpc_cidr = "10.0.0.0/16"
public_subnet_cidr = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
private_subnet_cidr = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
}
# The definition of the VPC to create
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "3.2.0"
name = local.vpc_name
cidr = local.vpc_cidr
azs = data.aws_availability_zones.available.names
private_subnets = local.private_subnet_cidr
public_subnets = local.public_subnet_cidr
enable_nat_gateway = true
single_nat_gateway = true
enable_dns_hostnames = true
tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
}
public_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/cluster/${var.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = "1"
}
}
data "aws_availability_zones" "available" {}
然后我为集群创建的安全组:
resource "aws_security_group" "ssh_sg" {
name_prefix = "ssh-sg"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [
"10.0.0.0/8",
]
}
}
resource "aws_security_group" "all_worker_mgmt" {
name_prefix = "all_worker_management"
vpc_id = module.vpc.vpc_id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [
"10.0.0.0/8",
"172.16.0.0/12",
"192.168.0.0/16",
]
}
}
这是集群配置:
locals {
cluster_version = "1.21"
}
# Create the EKS resource that will setup the EKS cluster
module "eks_cluster" {
source = "terraform-aws-modules/eks/aws"
# The name of the cluster to create
cluster_name = var.cluster_name
# Disable public access to the cluster API endpoint
cluster_endpoint_public_access = true
# Enable private access to the cluster API endpoint
cluster_endpoint_private_access = true
# The version of the cluster to create
cluster_version = local.cluster_version
# The VPC ID to create the cluster in
vpc_id = var.vpc_id
# The subnets to add the cluster to
subnets = var.private_subnets
# Default information on the workers
workers_group_defaults = {
root_volume_type = "gp2"
}
worker_additional_security_group_ids = [var.all_worker_mgmt_id]
# Specify the worker groups
worker_groups = [
{
# The name of this worker group
name = "default-workers"
# The instance type for this worker group
instance_type = var.eks_worker_instance_type
# The number of instances to raise up
asg_desired_capacity = var.eks_num_workers
asg_max_size = var.eks_num_workers
asg_min_size = var.eks_num_workers
# The security group IDs for these instances
additional_security_group_ids = [var.ssh_sg_id]
}
]
}
data "aws_eks_cluster" "cluster" {
name = module.eks_cluster.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks_cluster.cluster_id
}
output "worker_iam_role_name" {
value = module.eks_cluster.worker_iam_role_name
}
最后是堡垒:
locals {
ami = "ami-0f19d220602031aed" # Amazon Linux 2 AMI (us-east-2)
instance_type = "t3.small"
key_name = "bastion-kp"
}
resource "aws_iam_instance_profile" "bastion" {
name = "bastion"
role = var.role_name
}
resource "aws_instance" "bastion" {
ami = local.ami
instance_type = local.instance_type
key_name = local.key_name
associate_public_ip_address = true
subnet_id = var.public_subnet
iam_instance_profile = aws_iam_instance_profile.bastion.name
security_groups = [aws_security_group.bastion-sg.id]
tags = {
Name = "K8s Bastion"
}
lifecycle {
ignore_changes = all
}
user_data = <<EOF
#! /bin/bash
# Install Kubectl
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client
# Install Helm
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
helm version
# Install AWS
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install
aws --version
# Install aws-iam-authenticator
curl -o aws-iam-authenticator https://amazon-eks.s3.us-west-2.amazonaws.com/1.21.2/2021-07-05/bin/linux/amd64/aws-iam-authenticator
chmod +x ./aws-iam-authenticator
mkdir -p $HOME/bin && cp ./aws-iam-authenticator $HOME/bin/aws-iam-authenticator && export PATH=$PATH:$HOME/bin
echo 'export PATH=$PATH:$HOME/bin' >> ~/.bashrc
aws-iam-authenticator help
# Add the kube config file
mkdir ~/.kube
echo "${var.kubectl_config}" >> ~/.kube/config
EOF
}
resource "aws_security_group" "bastion-sg" {
name = "bastion-sg"
vpc_id = var.vpc_id
}
resource "aws_security_group_rule" "sg-rule-ssh" {
security_group_id = aws_security_group.bastion-sg.id
from_port = 22
protocol = "tcp"
to_port = 22
type = "ingress"
cidr_blocks = var.company_vpn_ips
depends_on = [aws_security_group.bastion-sg]
}
resource "aws_security_group_rule" "sg-rule-egress" {
security_group_id = aws_security_group.bastion-sg.id
type = "egress"
from_port = 0
protocol = "all"
to_port = 0
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
depends_on = [aws_security_group.bastion-sg]
}
询问
对我来说最紧迫的问题是找到一种通过堡垒与集群交互的方法,以便 Terraform 代码的另一部分可以 运行(在集群本身中启动的资源)。我也希望了解当 terraform apply
命令无法访问私有集群时如何设置它。预先感谢您提供的任何帮助!
查看您的节点组如何与控制平面通信,您需要将相同的集群安全组添加到您的堡垒主机,以便它与控制平面通信。您可以在 EKS 控制台 - 网络选项卡上找到 SG id。
我认为您需要在配置映射 aws-auth 的 terraform 代码中部署 Kubernetes 资源,这使您的堡垒主机有权访问此集群,因为它只能由其用户创建者访问 看看这个 link -> https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html