fargate 在 docker 上失败拉入私有子网
fargate failing on docker pull in private subnet
我在部署 Fargate 集群时遇到问题,它在 docker 拉取映像上失败并出现错误 "CannotPullContainerError"。我正在使用 cloudformation 创建堆栈,这不是可选的,它会创建完整的堆栈,但在尝试基于上述错误启动任务时失败。
我附上了可能突出问题的 cloudformation 堆栈文件,并且我仔细检查了子网是否有到 nat(如下)的路由。我还通过 ssh 连接到同一子网中的一个实例,该实例能够进行外部路由。我想知道我是否没有正确放置所需的部分,即服务 + 负载均衡器在私有子网中,或者我不应该将内部 lb 放在同一个子网中吗???
此子网是当前有位置的子网,但文件中的所有 3 个都具有相同的 nat 设置。
子网可路由 (subnet-34b92250)
* 0.0.0.0/0 -> nat-05a00385366da527a
提前加油
yaml cloudformaition 脚本:
AWSTemplateFormatVersion: 2010-09-09
Description: Cloudformation stack for the new GRPC endpoints within existing vpc/subnets and using fargate
Parameters:
StackName:
Type: String
Default: cf-core-ci-grpc
Description: The name of the parent Fargate networking stack that you created. Necessary
vpcId:
Type: String
Default: vpc-0d499a68
Description: The name of the parent Fargate networking stack that you created. Necessary
Resources:
CoreGrcpInstanceSecurityGroupOpenWeb:
Type: 'AWS::EC2::SecurityGroup'
Properties:
GroupName: sgg-core-ci-grpc-ingress
GroupDescription: Allow http to client host
VpcId: !Ref vpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: '80'
ToPort: '80'
CidrIp: 0.0.0.0/0
SecurityGroupEgress:
- IpProtocol: tcp
FromPort: '80'
ToPort: '80'
CidrIp: 0.0.0.0/0
LoadBalancer:
Type: 'AWS::ElasticLoadBalancingV2::LoadBalancer'
DependsOn:
- CoreGrcpInstanceSecurityGroupOpenWeb
Properties:
Name: lb-core-ci-int-grpc
Scheme: internal
Subnets:
# # pub
# - subnet-f13995a8
# - subnet-f13995a8
# - subnet-f13995a8
# pri
- subnet-34b92250
- subnet-82d85af4
- subnet-ca379b93
LoadBalancerAttributes:
- Key: idle_timeout.timeout_seconds
Value: '50'
SecurityGroups:
- !Ref CoreGrcpInstanceSecurityGroupOpenWeb
TargetGroup:
Type: 'AWS::ElasticLoadBalancingV2::TargetGroup'
DependsOn:
- LoadBalancer
Properties:
Name: tg-core-ci-grpc
Port: 3000
TargetType: ip
Protocol: HTTP
HealthCheckIntervalSeconds: 30
HealthCheckProtocol: HTTP
HealthCheckTimeoutSeconds: 10
HealthyThresholdCount: 4
Matcher:
HttpCode: '200'
TargetGroupAttributes:
- Key: deregistration_delay.timeout_seconds
Value: '20'
UnhealthyThresholdCount: 3
VpcId: !Ref vpcId
LoadBalancerListener:
Type: 'AWS::ElasticLoadBalancingV2::Listener'
DependsOn:
- TargetGroup
Properties:
DefaultActions:
- Type: forward
TargetGroupArn: !Ref TargetGroup
LoadBalancerArn: !Ref LoadBalancer
Port: 80
Protocol: HTTP
EcsCluster:
Type: 'AWS::ECS::Cluster'
DependsOn:
- LoadBalancerListener
Properties:
ClusterName: ecs-core-ci-grpc
EcsTaskRole:
Type: 'AWS::IAM::Role'
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service:
# - ecs.amazonaws.com
- ecs-tasks.amazonaws.com
Action:
- 'sts:AssumeRole'
Path: /
Policies:
- PolicyName: iam-policy-ecs-task-core-ci-grpc
PolicyDocument:
Statement:
- Effect: Allow
Action:
- 'ecr:**'
Resource: '*'
CoreGrcpTaskDefinition:
Type: 'AWS::ECS::TaskDefinition'
DependsOn:
- EcsCluster
- EcsTaskRole
Properties:
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
ExecutionRoleArn: !Ref EcsTaskRole
Cpu: '1024'
Memory: '2048'
ContainerDefinitions:
- Name: container-core-ci-grpc
Image: 'nginx:latest'
Cpu: '256'
Memory: '1024'
PortMappings:
- ContainerPort: '80'
HostPort: '80'
Essential: 'true'
EcsService:
Type: 'AWS::ECS::Service'
DependsOn:
- CoreGrcpTaskDefinition
Properties:
Cluster: !Ref EcsCluster
LaunchType: FARGATE
DesiredCount: '1'
DeploymentConfiguration:
MaximumPercent: 150
MinimumHealthyPercent: 0
LoadBalancers:
- ContainerName: container-core-ci-grpc
ContainerPort: '80'
TargetGroupArn: !Ref TargetGroup
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: DISABLED
SecurityGroups:
- !Ref CoreGrcpInstanceSecurityGroupOpenWeb
Subnets:
- subnet-34b92250
- subnet-82d85af4
- subnet-ca379b93
TaskDefinition: !Ref CoreGrcpTaskDefinition
请在您的 ECR 注册表中定义此策略并将 IAM 角色附加到您的任务中。
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "new statement",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::99999999999:role/ecsEventsRole"
},
"Action": [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability",
"ecr:PutImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload"
]
}
]
}
遗憾的是,AWS Fargate 仅支持托管在 ECR 中的图像或 Docker Hub 中的 public 存储库,不支持托管在 Docker Hub 中的私有存储库。
更多信息 - https://forums.aws.amazon.com/thread.jspa?threadID=268415
几个月前,甚至我们在使用 AWS Fargate 时也遇到了同样的问题。您现在只有两个选择:
将您的图像迁移到 Amazon ECR。
将 AWS Batch 与自定义 AMI 结合使用,其中自定义 AMI 是使用 ECS 配置中的 Docker 集线器凭据构建的(我们现在正在使用)。
编辑: 如Christopher Thomas in the comment, ECS fargate now supports pulling images from DockerHub Private repositories. More info on how to set it up can be found here所述。
我在部署 Fargate 集群时遇到问题,它在 docker 拉取映像上失败并出现错误 "CannotPullContainerError"。我正在使用 cloudformation 创建堆栈,这不是可选的,它会创建完整的堆栈,但在尝试基于上述错误启动任务时失败。
我附上了可能突出问题的 cloudformation 堆栈文件,并且我仔细检查了子网是否有到 nat(如下)的路由。我还通过 ssh 连接到同一子网中的一个实例,该实例能够进行外部路由。我想知道我是否没有正确放置所需的部分,即服务 + 负载均衡器在私有子网中,或者我不应该将内部 lb 放在同一个子网中吗???
此子网是当前有位置的子网,但文件中的所有 3 个都具有相同的 nat 设置。
子网可路由 (subnet-34b92250) * 0.0.0.0/0 -> nat-05a00385366da527a
提前加油
yaml cloudformaition 脚本:
AWSTemplateFormatVersion: 2010-09-09
Description: Cloudformation stack for the new GRPC endpoints within existing vpc/subnets and using fargate
Parameters:
StackName:
Type: String
Default: cf-core-ci-grpc
Description: The name of the parent Fargate networking stack that you created. Necessary
vpcId:
Type: String
Default: vpc-0d499a68
Description: The name of the parent Fargate networking stack that you created. Necessary
Resources:
CoreGrcpInstanceSecurityGroupOpenWeb:
Type: 'AWS::EC2::SecurityGroup'
Properties:
GroupName: sgg-core-ci-grpc-ingress
GroupDescription: Allow http to client host
VpcId: !Ref vpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: '80'
ToPort: '80'
CidrIp: 0.0.0.0/0
SecurityGroupEgress:
- IpProtocol: tcp
FromPort: '80'
ToPort: '80'
CidrIp: 0.0.0.0/0
LoadBalancer:
Type: 'AWS::ElasticLoadBalancingV2::LoadBalancer'
DependsOn:
- CoreGrcpInstanceSecurityGroupOpenWeb
Properties:
Name: lb-core-ci-int-grpc
Scheme: internal
Subnets:
# # pub
# - subnet-f13995a8
# - subnet-f13995a8
# - subnet-f13995a8
# pri
- subnet-34b92250
- subnet-82d85af4
- subnet-ca379b93
LoadBalancerAttributes:
- Key: idle_timeout.timeout_seconds
Value: '50'
SecurityGroups:
- !Ref CoreGrcpInstanceSecurityGroupOpenWeb
TargetGroup:
Type: 'AWS::ElasticLoadBalancingV2::TargetGroup'
DependsOn:
- LoadBalancer
Properties:
Name: tg-core-ci-grpc
Port: 3000
TargetType: ip
Protocol: HTTP
HealthCheckIntervalSeconds: 30
HealthCheckProtocol: HTTP
HealthCheckTimeoutSeconds: 10
HealthyThresholdCount: 4
Matcher:
HttpCode: '200'
TargetGroupAttributes:
- Key: deregistration_delay.timeout_seconds
Value: '20'
UnhealthyThresholdCount: 3
VpcId: !Ref vpcId
LoadBalancerListener:
Type: 'AWS::ElasticLoadBalancingV2::Listener'
DependsOn:
- TargetGroup
Properties:
DefaultActions:
- Type: forward
TargetGroupArn: !Ref TargetGroup
LoadBalancerArn: !Ref LoadBalancer
Port: 80
Protocol: HTTP
EcsCluster:
Type: 'AWS::ECS::Cluster'
DependsOn:
- LoadBalancerListener
Properties:
ClusterName: ecs-core-ci-grpc
EcsTaskRole:
Type: 'AWS::IAM::Role'
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service:
# - ecs.amazonaws.com
- ecs-tasks.amazonaws.com
Action:
- 'sts:AssumeRole'
Path: /
Policies:
- PolicyName: iam-policy-ecs-task-core-ci-grpc
PolicyDocument:
Statement:
- Effect: Allow
Action:
- 'ecr:**'
Resource: '*'
CoreGrcpTaskDefinition:
Type: 'AWS::ECS::TaskDefinition'
DependsOn:
- EcsCluster
- EcsTaskRole
Properties:
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
ExecutionRoleArn: !Ref EcsTaskRole
Cpu: '1024'
Memory: '2048'
ContainerDefinitions:
- Name: container-core-ci-grpc
Image: 'nginx:latest'
Cpu: '256'
Memory: '1024'
PortMappings:
- ContainerPort: '80'
HostPort: '80'
Essential: 'true'
EcsService:
Type: 'AWS::ECS::Service'
DependsOn:
- CoreGrcpTaskDefinition
Properties:
Cluster: !Ref EcsCluster
LaunchType: FARGATE
DesiredCount: '1'
DeploymentConfiguration:
MaximumPercent: 150
MinimumHealthyPercent: 0
LoadBalancers:
- ContainerName: container-core-ci-grpc
ContainerPort: '80'
TargetGroupArn: !Ref TargetGroup
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: DISABLED
SecurityGroups:
- !Ref CoreGrcpInstanceSecurityGroupOpenWeb
Subnets:
- subnet-34b92250
- subnet-82d85af4
- subnet-ca379b93
TaskDefinition: !Ref CoreGrcpTaskDefinition
请在您的 ECR 注册表中定义此策略并将 IAM 角色附加到您的任务中。
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "new statement",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::99999999999:role/ecsEventsRole"
},
"Action": [
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability",
"ecr:PutImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload"
]
}
]
}
遗憾的是,AWS Fargate 仅支持托管在 ECR 中的图像或 Docker Hub 中的 public 存储库,不支持托管在 Docker Hub 中的私有存储库。
更多信息 - https://forums.aws.amazon.com/thread.jspa?threadID=268415
几个月前,甚至我们在使用 AWS Fargate 时也遇到了同样的问题。您现在只有两个选择:
将您的图像迁移到 Amazon ECR。
将 AWS Batch 与自定义 AMI 结合使用,其中自定义 AMI 是使用 ECS 配置中的 Docker 集线器凭据构建的(我们现在正在使用)。
编辑: 如Christopher Thomas in the comment, ECS fargate now supports pulling images from DockerHub Private repositories. More info on how to set it up can be found here所述。