fargate 在 docker 上失败拉入私有子网

fargate failing on docker pull in private subnet

我在部署 Fargate 集群时遇到问题,它在 docker 拉取映像上失败并出现错误 "CannotPullContainerError"。我正在使用 cloudformation 创建堆栈,这不是可选的,它会创建完整的堆栈,但在尝试基于上述错误启动任务时失败。

我附上了可能突出问题的 cloudformation 堆栈文件,并且我仔细检查了子网是否有到 nat(如下)的路由。我还通过 ssh 连接到同一子网中的一个实例,该实例能够进行外部路由。我想知道我是否没有正确放置所需的部分,即服务 + 负载均衡器在私有子网中,或者我不应该将内部 lb 放在同一个子网中吗???

此子网是当前有位置的子网,但文件中的所有 3 个都具有相同的 nat 设置。

子网可路由 (subnet-34b92250) * 0.0.0.0/0 -> nat-05a00385366da527a

提前加油

yaml cloudformaition 脚本:

AWSTemplateFormatVersion: 2010-09-09
Description: Cloudformation stack for the new GRPC endpoints within existing vpc/subnets and using fargate
Parameters:
  StackName:
    Type: String
    Default: cf-core-ci-grpc
    Description: The name of the parent Fargate networking stack that you created. Necessary
  vpcId:
    Type: String
    Default: vpc-0d499a68
    Description: The name of the parent Fargate networking stack that you created. Necessary
Resources:
  CoreGrcpInstanceSecurityGroupOpenWeb:
    Type: 'AWS::EC2::SecurityGroup'
    Properties:
      GroupName: sgg-core-ci-grpc-ingress
      GroupDescription: Allow http to client host
      VpcId: !Ref vpcId
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: '80'
          ToPort: '80'
          CidrIp: 0.0.0.0/0
      SecurityGroupEgress:
        - IpProtocol: tcp
          FromPort: '80'
          ToPort: '80'
          CidrIp: 0.0.0.0/0
  LoadBalancer:
    Type: 'AWS::ElasticLoadBalancingV2::LoadBalancer'
    DependsOn:
      - CoreGrcpInstanceSecurityGroupOpenWeb
    Properties:
      Name: lb-core-ci-int-grpc
      Scheme: internal
      Subnets:
      # # pub
      #   - subnet-f13995a8
      #   - subnet-f13995a8
      #   - subnet-f13995a8
      # pri
        - subnet-34b92250
        - subnet-82d85af4
        - subnet-ca379b93
      LoadBalancerAttributes:
        - Key: idle_timeout.timeout_seconds
          Value: '50'
      SecurityGroups:
        - !Ref CoreGrcpInstanceSecurityGroupOpenWeb
  TargetGroup:
    Type: 'AWS::ElasticLoadBalancingV2::TargetGroup'
    DependsOn:
      - LoadBalancer
    Properties:
      Name: tg-core-ci-grpc
      Port: 3000
      TargetType: ip
      Protocol: HTTP
      HealthCheckIntervalSeconds: 30
      HealthCheckProtocol: HTTP
      HealthCheckTimeoutSeconds: 10
      HealthyThresholdCount: 4
      Matcher:
        HttpCode: '200'
      TargetGroupAttributes:
        - Key: deregistration_delay.timeout_seconds
          Value: '20'
      UnhealthyThresholdCount: 3
      VpcId: !Ref vpcId
  LoadBalancerListener:
    Type: 'AWS::ElasticLoadBalancingV2::Listener'
    DependsOn:
      - TargetGroup
    Properties:
      DefaultActions:
        - Type: forward
          TargetGroupArn: !Ref TargetGroup
      LoadBalancerArn: !Ref LoadBalancer
      Port: 80
      Protocol: HTTP
  EcsCluster:
    Type: 'AWS::ECS::Cluster'
    DependsOn:
      - LoadBalancerListener
    Properties:
      ClusterName: ecs-core-ci-grpc
  EcsTaskRole:
    Type: 'AWS::IAM::Role'
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service:
                # - ecs.amazonaws.com
                - ecs-tasks.amazonaws.com
            Action:
              - 'sts:AssumeRole'
      Path: /
      Policies:
        - PolicyName: iam-policy-ecs-task-core-ci-grpc
          PolicyDocument:
            Statement:
              - Effect: Allow
                Action:
                  - 'ecr:**'
                Resource: '*'
  CoreGrcpTaskDefinition:
    Type: 'AWS::ECS::TaskDefinition'
    DependsOn:
      - EcsCluster
      - EcsTaskRole
    Properties:
      NetworkMode: awsvpc
      RequiresCompatibilities:
        - FARGATE
      ExecutionRoleArn: !Ref EcsTaskRole
      Cpu: '1024'
      Memory: '2048'
      ContainerDefinitions:
        - Name: container-core-ci-grpc
          Image: 'nginx:latest'
          Cpu: '256'
          Memory: '1024'
          PortMappings:
            - ContainerPort: '80'
              HostPort: '80'
          Essential: 'true'
  EcsService:
    Type: 'AWS::ECS::Service'
    DependsOn:
      - CoreGrcpTaskDefinition
    Properties:
      Cluster: !Ref EcsCluster
      LaunchType: FARGATE
      DesiredCount: '1'
      DeploymentConfiguration:
        MaximumPercent: 150
        MinimumHealthyPercent: 0
      LoadBalancers:
        - ContainerName: container-core-ci-grpc
          ContainerPort: '80'
          TargetGroupArn: !Ref TargetGroup
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: DISABLED
          SecurityGroups:
            - !Ref CoreGrcpInstanceSecurityGroupOpenWeb
          Subnets:
            - subnet-34b92250
            - subnet-82d85af4
            - subnet-ca379b93
      TaskDefinition: !Ref CoreGrcpTaskDefinition

请在您的 ECR 注册表中定义此策略并将 IAM 角色附加到您的任务中。

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Sid": "new statement",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::99999999999:role/ecsEventsRole"
            },
            "Action": [
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchGetImage",
                "ecr:BatchCheckLayerAvailability",
                "ecr:PutImage",
                "ecr:InitiateLayerUpload",
                "ecr:UploadLayerPart",
                "ecr:CompleteLayerUpload"
            ]
        }
    ]
}

遗憾的是,AWS Fargate 仅支持托管在 ECR 中的图像或 Docker Hub 中的 public 存储库,不支持托管在 Docker Hub 中的私有存储库。
更多信息 - https://forums.aws.amazon.com/thread.jspa?threadID=268415

几个月前,甚至我们在使用 AWS Fargate 时也遇到了同样的问题。您现在只有两个选择:

  1. 将您的图像迁移到 Amazon ECR。

  2. 将 AWS Batch 与自定义 AMI 结合使用,其中自定义 AMI 是使用 ECS 配置中的 Docker 集线器凭据构建的(我们现在正在使用)。

编辑:Christopher Thomas in the comment, ECS fargate now supports pulling images from DockerHub Private repositories. More info on how to set it up can be found here所述。