Kubernetes 1.14.2 HA Master NGINX load balancer log.go:172] http: TLS handshake error from 192.168.5.32:43148: remote error: tls: bad certificate

Kubernetes 1.14.2 HA Master NGINX load balancer log.go:172] http: TLS handshake error from 192.168.5.32:43148: remote error: tls: bad certificate

这让我发疯,我不是 Kubernetes 专家,但我也不是新手。

我试了三天都没有成功解决这个问题,但我做不到,我已经无能为力了。

将证书从 (kube-apiserver-1:/etc/kubernetes/pki/*) 复制到我的桌面后,我可以从我的桌面查询集群。

$ kubectl -n kube-system get nodes
NAME               STATUS   ROLES    AGE   VERSION
kube-apiserver-1   Ready    master   71m   v1.14.2

当我查询 kube-system 时,Kubernetes 集群看起来很健康 pods:

$ kubectl -n kube-system get pods
NAME                                       READY   STATUS    RESTARTS   AGE
coredns-fb8b8dccf-6c85q                    1/1     Running   3          65m
coredns-fb8b8dccf-qwxlp                    1/1     Running   3          65m
kube-apiserver-kube-apiserver-1            1/1     Running   2          72m
kube-controller-manager-kube-apiserver-1   1/1     Running   2          72m
kube-flannel-ds-amd64-phntk                1/1     Running   2          62m
kube-proxy-swxrz                           1/1     Running   2          65m
kube-scheduler-kube-apiserver-1            1/1     Running   1          54m

但是当我查询 api kubelet 时:

$ kubectl -n kube-system logs kube-apiserver-kube-apiserver-1 
...
I0526 04:33:51.523828       1 log.go:172] http: TLS handshake error from 192.168.5.32:43122: remote error: tls: bad certificate
I0526 04:33:51.537258       1 log.go:172] http: TLS handshake error from 192.168.5.32:43124: remote error: tls: bad certificate
I0526 04:33:51.540617       1 log.go:172] http: TLS handshake error from 192.168.5.32:43126: remote error: tls: bad certificate
I0526 04:33:52.333817       1 log.go:172] http: TLS handshake error from 192.168.5.32:43130: remote error: tls: bad certificate
I0526 04:33:52.334354       1 log.go:172] http: TLS handshake error from 192.168.5.32:43128: remote error: tls: bad certificate
I0526 04:33:52.335570       1 log.go:172] http: TLS handshake error from 192.168.5.32:43132: remote error: tls: bad certificate
I0526 04:33:52.336703       1 log.go:172] http: TLS handshake error from 192.168.5.32:43134: remote error: tls: bad certificate
I0526 04:33:52.338792       1 log.go:172] http: TLS handshake error from 192.168.5.32:43136: remote error: tls: bad certificate
I0526 04:33:52.391557       1 log.go:172] http: TLS handshake error from 192.168.5.32:43138: remote error: tls: bad certificate
I0526 04:33:52.396566       1 log.go:172] http: TLS handshake error from 192.168.5.32:43140: remote error: tls: bad certificate
I0526 04:33:52.519666       1 log.go:172] http: TLS handshake error from 192.168.5.32:43142: remote error: tls: bad certificate
I0526 04:33:52.524702       1 log.go:172] http: TLS handshake error from 192.168.5.32:43144: remote error: tls: bad certificate
I0526 04:33:52.537127       1 log.go:172] http: TLS handshake error from 192.168.5.32:43146: remote error: tls: bad certificate
I0526 04:33:52.550177       1 log.go:172] http: TLS handshake error from 192.168.5.32:43150: remote error: tls: bad certificate
I0526 04:33:52.550613       1 log.go:172] http: TLS handshake error from 192.168.5.32:43148: remote error: tls: bad certificate

在 NGINX 负载均衡器(IP:192.168.5.32)上,我已按照 Kubernetes 文档中的说明配置了 TCP 直通选项:

upstream kubernetes-api-cluster {
   server 192.168.5.19:6443;
   server 192.168.5.29:6443;
}
server {
   listen 6443;
   ssl_certificate /etc/nginx/ssl/kube-apiserver.pem;
   ssl_certificate_key /etc/nginx/ssl/private/kube-apiserver.key;
   ssl_prefer_server_ciphers on;
   ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
   ssl_ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS;
   proxy_pass kubernetes-api-cluster;
}

我可以直接从NGINX LB(IP:192.168.5.32)查询API服务器:

$ curl -v https://192.168.5.29:6443
* Rebuilt URL to: https://192.168.5.29:6443/
*   Trying 192.168.5.29...
* TCP_NODELAY set
* Connected to 192.168.5.29 (192.168.5.29) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Request CERT (13):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=kube-apiserver
*  start date: May 26 03:39:36 2019 GMT
*  expire date: May 25 03:39:36 2020 GMT
*  subjectAltName: host "192.168.5.29" matched cert's IP address!
*  issuer: CN=kubernetes
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55840f1d9900)
> GET / HTTP/2
> Host: 192.168.5.29:6443
> User-Agent: curl/7.58.0
> Accept: */*

我还可以使用文档中指定的 api 的 DNS 条目查询 api:

curl -v https://kube-apiserver.mydomain.com:6443
* Rebuilt URL to: https://kube-apiserver.mydomain.com:6443/
*   Trying 10.50.1.50...
* TCP_NODELAY set
* Connected to kube-apiserver.mydomain.com (10.50.1.50) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Request CERT (13):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=kube-apiserver
*  start date: May 26 03:39:36 2019 GMT
*  expire date: May 25 03:39:36 2020 GMT
*  subjectAltName: host "kube-apiserver.mydomain.com" matched cert's "kube-apiserver.mydomain.com"
*  issuer: CN=kubernetes
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x564287cbd900)
> GET / HTTP/2
> Host: kube-apiserver.mydomain.com:6443
> User-Agent: curl/7.58.0
> Accept: */*

我也可以在 API 服务器上使用 curl 查询 api 服务器:

curl -v https://kube-apiserver.mydomain.com:6443
* Rebuilt URL to: https://kube-apiserver.mydomain.com:6443/
*   Trying 10.50.1.50...
* TCP_NODELAY set
* Connected to kube-apiserver.epc-instore.com (10.50.1.50) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.2 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Request CERT (13):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Client hello (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=kube-apiserver
*  start date: May 26 03:39:36 2019 GMT
*  expire date: May 25 03:39:36 2020 GMT
*  subjectAltName: host "kube-apiserver.mydomain.com" matched cert's "kube-apiserver.mydomain.com"
*  issuer: CN=kubernetes
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x5628b9dbc900)
> GET / HTTP/2
> Host: kube-apiserver.mydomain.com:6443
> User-Agent: curl/7.58.0
> Accept: */*

api 服务器上的清单包含:

cat /etc/kubernetes/manifest/kube-apiserver.yaml
...
  - command:
    - kube-apiserver
    - --advertise-address=192.168.5.29
    - --allow-privileged=true
    - --authorization-mode=Node,RBAC
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --enable-admission-plugins=NodeRestriction
    - --enable-bootstrap-token-auth=true
    - --etcd-servers=http://etcd-cluster.mydomain.com:2379
    - --insecure-port=0
    - --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
    - --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    - --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
    - --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
    - --requestheader-allowed-names=front-proxy-client
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
    - --requestheader-extra-headers-prefix=X-Remote-Extra-
    - --requestheader-group-headers=X-Remote-Group
    - --requestheader-username-headers=X-Remote-User
    - --secure-port=6443
    - --service-account-key-file=/etc/kubernetes/pki/sa.pub
    - --service-cluster-ip-range=10.96.0.0/12
    - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
    - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
    image: k8s.gcr.io/kube-apiserver:v1.14.2
    imagePullPolicy: IfNotPresent
...

如果您对如何解决此问题有任何想法或提示,我会洗耳恭听。我对这个问题感到非常沮丧,在这一点上它真的让我感到沮丧。我会继续努力,但如果有人知道这个问题并能提供帮助,那就太好了。

谢谢。

您当前的 nginx 配置没有设置客户端证书。 ssl_certificate 是服务器证书,如果您希望它向 kubernetes-api-cluster 提供客户端证书,您必须配置 nginx 以转发传入的客户端证书。我之前使用 proxy_set_header X-SSL-CERT $ssl_client_escaped_cert (documentation)

完成了此操作
upstream kubernetes-api-cluster { 
    server 192.168.5.19:6443;
    server 192.168.5.29:6443; 
} 

server { 
    listen 6443;
    ssl_certificate /etc/nginx/ssl/kube-apiserver.pem;
    ssl_certificate_key /etc/nginx/ssl/private/kube-apiserver.key;
    ssl_prefer_server_ciphers on;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
    ssl_ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS;
    proxy_pass kubernetes-api-cluster; 

    #forward incoming client certificate
    ssl_verify_client optional; #requests the client certificate and verifies it if the certificate is present
    proxy_set_header X-SSL-CERT $ssl_client_escaped_cert;
}

这更像是一种真正针对问题根源的故障排除思路。 如果你能做到:

kubectl --kubeconfig /etc/kubernetes/admin.conf get nodes 

从 api 服务器收到响应,那么问题不在于负载平衡器。为了进一步证明这一点,您可以将适当的证书和文件复制到远程工作站并执行相同的操作:

kubectl --kubeconfig [workstation location]/admin.conf get nodes

第二个显然暗示您可以直接访问负载均衡器。
如果这也有效,您可以确认证书正在通过 TCP 负载平衡器传递。

但是,由于负载平衡器检查 "availability" 后端服务器,错误将持续存在。此检查不使用产生异常的证书。

原始问题的实际根本原因是(引用此 post @Daniel Maldonado 的作者):

This was my mistake, I had a firewall configuration error and all tests indicated that it was the load balancer probing the kube-apiserver when in fact it was not. The issue was completely local to the api-server itself. If anyone gets to this point please verify that ALL ports are available to the API server from itself i.e. loopback.