从自签名证书转换为商业证书 TLS 错误

Question

当我安装我们的集群时，我使用了来自我们内部 CA 机构的自签名证书。一切都很好，直到我开始从部署到 OKD 集群的应用程序中收到证书错误。我们决定不再尝试一次修复一个错误，而是简单地购买一个商业证书并安装它。因此，我们从 GlobalSign 购买了带有通配符（与我们最初从内部 CA 获得的通配符相同）的 SAN 证书，我正在尝试安装它，但遇到了很多问题。

请记住，我在这里尝试了数十次迭代。我只是在记录我尝试找出问题所在的最后一个。这是在我的测试集群上，它是一个 VM 服务器，我在每个之后恢复到快照。快照是使用内部 CA 证书的操作集群。

因此，我的第一步是构建要传入的 CA 文件。我下载了 GlobalSign 的根证书和中间证书，并将它们放入 ca-globalsign.crt 文件中。（PEM 格式）

当我运行

openssl verify -CAfile ../ca-globalsign.crt labtest.mycompany.com.pem

我得到：

labtest.mycompany.com.pem: OK

和 openssl x509 -in labtest.mycompany.com.pem -text -noout 给了我（编辑）

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            (redacted)
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=BE, O=GlobalSign nv-sa, CN=GlobalSign Organization Validation CA - SHA256 - G2
        Validity
            Not Before: Apr 29 16:11:07 2019 GMT
            Not After : Apr 29 16:11:07 2020 GMT
        Subject: C=US, ST=(redacted), L=(redacted), OU=Information Technology, O=(redacted), CN=labtest.mycompany.com
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    (redacted)
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            Authority Information Access:
                CA Issuers - URI:http://secure.globalsign.com/cacert/gsorganizationvalsha2g2r1.crt
                OCSP - URI:http://ocsp2.globalsign.com/gsorganizationvalsha2g2

            X509v3 Certificate Policies:
                Policy: 1.3.6.1.4.1.4146.1.20
                  CPS: https://www.globalsign.com/repository/
                Policy: 2.23.140.1.2.2

            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Subject Alternative Name:
                DNS:labtest.mycompany.com, DNS:*.labtest.mycompany.com, DNS:*.apps.labtest.mycompany.com
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Key Identifier:
                (redacted)
            X509v3 Authority Key Identifier:
                (redacted)

            (redacted)

在我的本地机器上。我所知道的关于 SSL 的一切都表明证书很好。这些新文件放在我用来保存 OKD 安装配置等的项目中。

然后我更新了我的 ansible 清单项目中的证书文件和运行命令

ansible-playbook -i ../okd_install/inventory/okd_labtest_inventory.yml playbooks/redeploy-certificates.yml

当我阅读文档时，一切都告诉我它应该简单地通过它的过程并提出新的证书。这不会发生。当我在我的清单文件中使用 openshift_master_overwrite_named_certificates: false 时，安装完成，但它只替换 *.apps.labtest 域上的证书，但 console.labtest 保持原始状态但它确实联机，除了监控在集群控制台中显示 bad gateway 的事实。

现在，如果我再次尝试运行命令，使用 openshift_master_overwrite_named_certificates: true 我的 /var/log/containers/master-api*.log 会充满这样的错误

{"log":"I0507 15:53:28.451851       1 logs.go:49] http: TLS handshake error from 10.128.0.56:46796: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.451894391Z"}
{"log":"I0507 15:53:28.455218       1 logs.go:49] http: TLS handshake error from 10.128.0.56:46798: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.455272658Z"}
{"log":"I0507 15:53:28.458742       1 logs.go:49] http: TLS handshake error from 10.128.0.56:46800: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.461070768Z"}
{"log":"I0507 15:53:28.462093       1 logs.go:49] http: TLS handshake error from 10.128.0.56:46802: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.463719816Z"}

还有这些

{"log":"I0507 15:53:29.355463       1 logs.go:49] http: TLS handshake error from 10.70.25.131:44424: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.357218793Z"}
{"log":"I0507 15:53:29.357961       1 logs.go:49] http: TLS handshake error from 10.70.25.132:43128: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.358779155Z"}
{"log":"I0507 15:53:29.357993       1 logs.go:49] http: TLS handshake error from 10.70.25.132:43126: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.358790397Z"}
{"log":"I0507 15:53:29.405532       1 logs.go:49] http: TLS handshake error from 10.70.25.131:44428: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.406873158Z"}
{"log":"I0507 15:53:29.527221       1 logs.go:49] http: TLS handshake error from 10.70.25.132:43130: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53

并且安装在 ansible 任务 TASK [Remove web console pods] 上挂起。它会在那里坐上几个小时。当进入主控制台并在 openshift-web-console 上运行 oc get pods 时，它处于 terminating 状态。当我描述试图以 pending 开始的 pod 时，它返回说硬盘已满。我假设那是因为它无法与存储系统通信，因为上面所有这些 TLS 错误。它只是留在那里。如果我强制删除终止的 pod，我可以恢复集群，然后重新启动 master，然后删除试图启动的新 pod，然后再次重新启动。然后 Web 控制台上线，但我所有的日志文件都充斥着这些 TLS 错误。但是，更令人担忧的是安装在那个地方挂起，所以我假设在将 Web 控制台联机后还有其他步骤也会导致我出现问题。

因此，我也尝试重新部署服务器 CA。这产生了问题，因为我的新证书不是 CA 证书。然后当我运行重新部署 CA 剧本，让集群重新创建服务器 CA 时，它完成得很好，但是当我尝试运行 redeploy-certificates.yml 时，我得到了相同的结果.

这是我的库存文件

all:
  children:
    etcd:
      hosts:
        okdmastertest.labtest.mycompany.com:
    masters:
      hosts:
        okdmastertest.labtest.mycompany.com:
    nodes:
      hosts:
        okdmastertest.labtest.mycompany.com:
          openshift_node_group_name: node-config-master-infra
        okdnodetest1.labtest.mycompany.com:
          openshift_node_group_name: node-config-compute
          openshift_schedulable: True
    OSEv3:
      children:
        etcd:
        masters:
        nodes:
        # https://docs.okd.io/latest/install_config/persistent_storage/persistent_storage_glusterfs.html#overview-containerized-glusterfs
        # https://github.com/openshift/openshift-ansible/tree/master/playbooks/openshift-glusterfs
        # glusterfs:
      vars:
        openshift_deployment_type: origin
        ansible_user: root

        openshift_master_cluster_method: native
        openshift_master_default_subdomain: apps.labtest.mycompany.com
        openshift_install_examples: true

        openshift_master_cluster_hostname: console.labtest.mycompany.com
        openshift_master_cluster_public_hostname: console.labtest.mycompany.com
        openshift_hosted_registry_routehost: registry.apps.labtest.mycompany.com

        openshift_certificate_expiry_warning_days: 30
        openshift_certificate_expiry_fail_on_warn: false
        openshift_master_overwrite_named_certificates: true
        openshift_hosted_registry_routetermination: reencrypt

        openshift_master_named_certificates:
          - certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
            keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
            cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
            names:
              - "console.labtest.mycompany.com"
              # - "labtest.mycompany.com"
              # - "*.labtest.mycompany.com"
              # - "*.apps.labtest.mycompany.com"
        openshift_hosted_router_certificate:
          certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
          keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
          cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
        openshift_hosted_registry_routecertificates:
          certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
          keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
          cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"

        # LDAP auth
        openshift_master_identity_providers:
        - name: 'mycompany_ldap_provider'
          challenge: true
          login: true
          kind: LDAPPasswordIdentityProvider
          attributes:
            id:
            - dn
            email:
            - mail
            name:
            - cn
            preferredUsername:
            - sAMAccountName
          bindDN: 'ldapbind@int.mycompany.com'
          bindPassword: (redacted) 
          insecure: true
          url: 'ldap://dc-pa1.int.mycompany.com/ou=mycompany,dc=int,dc=mycompany,dc=com'

我在这里错过了什么？我认为此 redeploy-certificates.yml 剧本旨在更新证书。为什么我不能将其转换为我的新商业证书？它几乎就像它替换路由器上的证书（有点），但在此过程中搞砸了内部服务器证书。我真的束手无策，我不知道还能尝试什么。

Answer 1

您应该将 openshift_master_cluster_hostname 和 openshift_master_cluster_public_hostname 配置为彼此不同的主机名。这两个主机名也应该由 DNS 解析。您的商业证书用作外部访问点。

The openshift_master_cluster_public_hostname and openshift_master_cluster_hostname parameters in the Ansible inventory file, by default /etc/ansible/hosts, must be different. 
If they are the same, the named certificates will fail and you will need to re-install them.

# Native HA with External LB VIPs
openshift_master_cluster_hostname=internal.paas.example.com
openshift_master_cluster_public_hostname=external.paas.example.com

而且你最好一步步配置每个组件的证书以便测试。例如，首先，Configuring a Custom Master Host Certificate，并验证。然后，Configuring a Custom Wildcard Certificate for the Default Router，验证。等等。如果您可以成功完成所有重新部署证书任务，那么您最终可以运行为您的商业证书维护提供完整的参数。

参考Configuring Custom Certificates了解更多详情。希望对你有帮助。

从自签名证书转换为商业证书 TLS 错误

Converting from self signed to commercial cert TLS errors

openshift

openshift-origin

kubernetes