由于 bind() 到 0.0.0.0:8443 失败（98：正在使用的地址），导致仅限 Fargate 的 EKS 集群上的 ingress-nginx 控制器崩溃

Question

我已经 helm 安装到我的 EKS 集群中的 ingress-nginx pod 一直失败，它的日志表明应用程序无法绑定到 0.0.0.0:8443 (INADDR_ANY:8443)。我已经确认 0.0.0.0:8443 确实已经绑定在容器中，但是因为我还没有对容器的 root 访问权限，所以我一直无法找到罪魁祸首 process/user.

我已经创建了我正在使用的 this issue on the kubernetes ingress-nginx project，但也想接触更广泛的 SO 社区，他们可能会为如何克服这个障碍提供见解、解决方案和故障排除建议。

作为 AWS/EKS 和 Kubernetes 的新手，很可能是一些环境配置错误导致了这个问题。例如，这是否可能是由错误配置的 AWS 主义引起的，例如 VPC（其子网或安全组）？预先感谢您的帮助！

linked GitHub issue 提供了有关 Terraform 提供的 EKS 环境以及 ingress-nginx 的 Helm 安装部署的大量详细信息。以下是一些关键细节：

EKS 集群配置为仅使用 Fargate worker，并且有 3 个 public 和 3 个私有子网，所有 6 个都可供集群及其每个 Fargate 配置文件使用。
还应该注意集群是新的，ingress-nginx pod 是第一次尝试将任何东西部署到集群，除了像 coredns 这样的 kube-system 项目，它已经被配置为运行在 Fargate 中。（这需要手动删除默认的 ec2 注释 here）
有 6 个 fargate 配置文件，但目前只有 2 个在使用：coredns 和 ingress。这些分别专用于 kube-system/kube-dns 和 ingress-nginx。除了选择器的名称空间和标签之外，配置文件规范没有任何“自定义”内容。已经确认选择器在 coredns 和 ingress 上都有效。 IE。入口 pods 计划到运行，但失败了。
ingress-nginx 使用端口 8443 的原因是我首先运行进入 this Privilege Escalation issue，其解决方法需要禁用 allowPrivilegeEscalation 并将端口从特权端口更改为非特权端口那些。我正在使用以下值调用 helm install：

controller: 
  extraArgs: 
    http-port: 8080 
    https-port: 8443 
  containerPort: 
    http: 8080 
    https: 8443 
  service: 
    ports: 
      http: 80 
      https: 443 
    targetPorts: 
      http: 8080 
      https: 8443 
  image: 
    allowPrivilegeEscalation: false
    # https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes
    livenessProbe:
      initialDelaySeconds: 60  # 30
    readinessProbe:
      initialDelaySeconds: 60  # 0
  service:
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-type: "nlb"

由于我最初的观察（在我查看日志之前）是 K8s liveness/readiness 探针 failing/timing out，我首先尝试在传递的值中扩展它们的 initialDelaySeconds掌舵安装。但最终我查看了 pod/container 日志，发现无论 *ness 探针设置如何，每次我重新安装 ingress-nginx pod 并稍等片刻，日志都会指示此处报告的相同绑定错误：

2021/11/12 17:15:02 [emerg] 27#27: bind() to [::]:8443 failed (98: Address in use)
.
.```
6. Aside from what I've noted above, I haven't intentionally configured anything "non-stock". I'm a bit lost in AWS/K8s's sea of configuration looking for what piece I need to adapt/correct.

Do you have clues or guesses why INADDR_ANY, port 8443 would already be bound in my (fairly-standard) `nginx-ingress-ingress-nginx-controller` pod/container?

As I aluded earlier, I am able to execute `netstat` command inside the running container as default user `www-data` to confirm indeed 0:8443 is already bound, but because I haven't yet figured out how to get root access, the PID/name of the processes are not available to me:

```> kubectl exec -n ingress --stdin --tty nginx-ingress-ingress-nginx-controller-74d46b8fd8-85tkh -- netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:10245         0.0.0.0:*               LISTEN      -
tcp        3      0 127.0.0.1:10246         0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.1:10247         0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:8181            0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:8181            0.0.0.0:*               LISTEN      -
tcp        0      0 :::8443                 :::*                    LISTEN      -
tcp        0      0 :::10254                :::*                    LISTEN      -
tcp        0      0 :::8080                 :::*                    LISTEN      -
tcp        0      0 :::8080                 :::*                    LISTEN      -
tcp        0      0 :::8181                 :::*                    LISTEN      -
tcp        0      0 :::8181                 :::*                    LISTEN      -```

```> kubectl exec -n ingress --stdin --tty nginx-ingress-ingress-nginx-controller-74d46b8fd8-85tkh -- /bin/bash
bash-5.1$ whoami
www-data
bash-5.1$ ps aux
PID   USER     TIME  COMMAND
    1 www-data  0:00 /usr/bin/dumb-init -- /nginx-ingress-controller --publish-service=ingress/nginx-ingress-ingress-nginx-controller --election-id=ingress-controller-leader --controller-class=k8s.io/ingress-nginx
    8 www-data  0:00 /nginx-ingress-controller --publish-service=ingress/nginx-ingress-ingress-nginx-controller --election-id=ingress-controller-leader --controller-class=k8s.io/ingress-nginx --configmap=ingress/n
   28 www-data  0:00 nginx: master process /usr/local/nginx/sbin/nginx -c /etc/nginx/nginx.conf
   30 www-data  0:00 nginx: worker process
   45 www-data  0:00 /bin/bash
   56 www-data  0:00 ps aux```

I'm currently looking into how to get root access to my Fargate containers (without mucking about with their Dockerfiles to install ssh..) so I can get more insight into what process/user is binding INADDR_ANY:8443 in this pod/container.

Answer 1

基于 same topic and this similar issue（均在 GitHub 页面）发布的社区维基答案。随意扩展它。

The answer from the GitHub:

The problem is that 8443 is already bound for the webhook. That's why I used 8081 in my suggestion, not 8443. The examples using 8443 here had to also move the webhook, which introduces more complexity to the changes, and can lead to weird issues if you get it wrong.

An example with used 8081 port:

As well as those settings, you'll also need to use the appropriate annotations to run using NLB rather than ELB, so all-up it ends up looking something like
controller:
 extraArgs:
   http-port: 8080
   https-port: 8081

 containerPort:
   http: 8080
   https: 8081

 image:
   allowPrivilegeEscalation: false

 service:
   annotations:
     service.beta.kubernetes.io/aws-load-balancer-type: "nlb-ip"
Edit: Fixed the aws-load-balancer-type to be nlb-ip, as that's required for Fargate. It probably should be
service.beta.kubernetes.io/aws-load-balancer-type: "external"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
for current versions of the AWS Load Balancer controller (2.2 onwards), but new versions will recognise the nlb-ip annotation

由于 bind() 到 0.0.0.0:8443 失败（98：正在使用的地址），导致仅限 Fargate 的 EKS 集群上的 ingress-nginx 控制器崩溃

crashing ingress-nginx controller on Fargate-only EKS cluster due to bind() to 0.0.0.0:8443 failed (98: Address in use)

kubernetes-helm

aws-fargate

kubernetes-ingress

amazon-eks

ingress-nginx