从自签名证书转换为商业证书 TLS 错误
Converting from self signed to commercial cert TLS errors
当我安装我们的集群时,我使用了来自我们内部 CA 机构的自签名证书。一切都很好,直到我开始从部署到 OKD 集群的应用程序中收到证书错误。我们决定不再尝试一次修复一个错误,而是简单地购买一个商业证书并安装它。因此,我们从 GlobalSign 购买了带有通配符(与我们最初从内部 CA 获得的通配符相同)的 SAN 证书,我正在尝试安装它,但遇到了很多问题。
请记住,我在这里尝试了数十次迭代。我只是在记录我尝试找出问题所在的最后一个。这是在我的测试集群上,它是一个 VM 服务器,我在每个之后恢复到快照。快照是使用内部 CA 证书的操作集群。
因此,我的第一步是构建要传入的 CA 文件。我下载了 GlobalSign 的根证书和中间证书,并将它们放入 ca-globalsign.crt
文件中。 (PEM 格式)
当我运行
openssl verify -CAfile ../ca-globalsign.crt labtest.mycompany.com.pem
我得到:
labtest.mycompany.com.pem: OK
和 openssl x509 -in labtest.mycompany.com.pem -text -noout
给了我(编辑)
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
(redacted)
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=BE, O=GlobalSign nv-sa, CN=GlobalSign Organization Validation CA - SHA256 - G2
Validity
Not Before: Apr 29 16:11:07 2019 GMT
Not After : Apr 29 16:11:07 2020 GMT
Subject: C=US, ST=(redacted), L=(redacted), OU=Information Technology, O=(redacted), CN=labtest.mycompany.com
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
(redacted)
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
Authority Information Access:
CA Issuers - URI:http://secure.globalsign.com/cacert/gsorganizationvalsha2g2r1.crt
OCSP - URI:http://ocsp2.globalsign.com/gsorganizationvalsha2g2
X509v3 Certificate Policies:
Policy: 1.3.6.1.4.1.4146.1.20
CPS: https://www.globalsign.com/repository/
Policy: 2.23.140.1.2.2
X509v3 Basic Constraints:
CA:FALSE
X509v3 Subject Alternative Name:
DNS:labtest.mycompany.com, DNS:*.labtest.mycompany.com, DNS:*.apps.labtest.mycompany.com
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Key Identifier:
(redacted)
X509v3 Authority Key Identifier:
(redacted)
(redacted)
在我的本地机器上。我所知道的关于 SSL 的一切都表明证书很好。这些新文件放在我用来保存 OKD 安装配置等的项目中。
然后我更新了我的 ansible 清单项目中的证书文件和 运行 命令
ansible-playbook -i ../okd_install/inventory/okd_labtest_inventory.yml playbooks/redeploy-certificates.yml
当我阅读文档时,一切都告诉我它应该简单地通过它的过程并提出新的证书。这不会发生。当我在我的清单文件中使用 openshift_master_overwrite_named_certificates: false
时,安装完成,但它只替换 *.apps.labtest
域上的证书,但 console.labtest
保持原始状态但它确实联机,除了监控在集群控制台中显示 bad gateway
的事实。
现在,如果我再次尝试 运行 命令,使用 openshift_master_overwrite_named_certificates: true
我的 /var/log/containers/master-api*.log
会充满这样的错误
{"log":"I0507 15:53:28.451851 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46796: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.451894391Z"}
{"log":"I0507 15:53:28.455218 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46798: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.455272658Z"}
{"log":"I0507 15:53:28.458742 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46800: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.461070768Z"}
{"log":"I0507 15:53:28.462093 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46802: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.463719816Z"}
还有这些
{"log":"I0507 15:53:29.355463 1 logs.go:49] http: TLS handshake error from 10.70.25.131:44424: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.357218793Z"}
{"log":"I0507 15:53:29.357961 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43128: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.358779155Z"}
{"log":"I0507 15:53:29.357993 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43126: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.358790397Z"}
{"log":"I0507 15:53:29.405532 1 logs.go:49] http: TLS handshake error from 10.70.25.131:44428: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.406873158Z"}
{"log":"I0507 15:53:29.527221 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43130: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53
并且安装在 ansible 任务 TASK [Remove web console pods]
上挂起。它会在那里坐上几个小时。当进入主控制台并在 openshift-web-console
上 运行 oc get pods
时,它处于 terminating
状态。当我描述试图以 pending
开始的 pod 时,它返回说硬盘已满。我假设那是因为它无法与存储系统通信,因为上面所有这些 TLS 错误。它只是留在那里。如果我强制删除终止的 pod,我可以恢复集群,然后重新启动 master,然后删除试图启动的新 pod,然后再次重新启动。然后 Web 控制台上线,但我所有的日志文件都充斥着这些 TLS 错误。但是,更令人担忧的是安装在那个地方挂起,所以我假设在将 Web 控制台联机后还有其他步骤也会导致我出现问题。
因此,我也尝试重新部署服务器 CA。这产生了问题,因为我的新证书不是 CA 证书。然后当我 运行 重新部署 CA 剧本,让集群重新创建服务器 CA 时,它完成得很好,但是当我尝试 运行 redeploy-certificates.yml
时,我得到了相同的结果.
这是我的库存文件
all:
children:
etcd:
hosts:
okdmastertest.labtest.mycompany.com:
masters:
hosts:
okdmastertest.labtest.mycompany.com:
nodes:
hosts:
okdmastertest.labtest.mycompany.com:
openshift_node_group_name: node-config-master-infra
okdnodetest1.labtest.mycompany.com:
openshift_node_group_name: node-config-compute
openshift_schedulable: True
OSEv3:
children:
etcd:
masters:
nodes:
# https://docs.okd.io/latest/install_config/persistent_storage/persistent_storage_glusterfs.html#overview-containerized-glusterfs
# https://github.com/openshift/openshift-ansible/tree/master/playbooks/openshift-glusterfs
# glusterfs:
vars:
openshift_deployment_type: origin
ansible_user: root
openshift_master_cluster_method: native
openshift_master_default_subdomain: apps.labtest.mycompany.com
openshift_install_examples: true
openshift_master_cluster_hostname: console.labtest.mycompany.com
openshift_master_cluster_public_hostname: console.labtest.mycompany.com
openshift_hosted_registry_routehost: registry.apps.labtest.mycompany.com
openshift_certificate_expiry_warning_days: 30
openshift_certificate_expiry_fail_on_warn: false
openshift_master_overwrite_named_certificates: true
openshift_hosted_registry_routetermination: reencrypt
openshift_master_named_certificates:
- certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
names:
- "console.labtest.mycompany.com"
# - "labtest.mycompany.com"
# - "*.labtest.mycompany.com"
# - "*.apps.labtest.mycompany.com"
openshift_hosted_router_certificate:
certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
openshift_hosted_registry_routecertificates:
certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
# LDAP auth
openshift_master_identity_providers:
- name: 'mycompany_ldap_provider'
challenge: true
login: true
kind: LDAPPasswordIdentityProvider
attributes:
id:
- dn
email:
- mail
name:
- cn
preferredUsername:
- sAMAccountName
bindDN: 'ldapbind@int.mycompany.com'
bindPassword: (redacted)
insecure: true
url: 'ldap://dc-pa1.int.mycompany.com/ou=mycompany,dc=int,dc=mycompany,dc=com'
我在这里错过了什么?我认为此 redeploy-certificates.yml
剧本旨在更新证书。为什么我不能将其转换为我的新商业证书?它几乎就像它替换路由器上的证书(有点),但在此过程中搞砸了内部服务器证书。我真的束手无策,我不知道还能尝试什么。
您应该将 openshift_master_cluster_hostname
和 openshift_master_cluster_public_hostname
配置为彼此不同的主机名。
这两个主机名也应该由 DNS 解析。您的商业证书用作外部访问点。
The openshift_master_cluster_public_hostname and openshift_master_cluster_hostname parameters in the Ansible inventory file, by default /etc/ansible/hosts, must be different.
If they are the same, the named certificates will fail and you will need to re-install them.
# Native HA with External LB VIPs
openshift_master_cluster_hostname=internal.paas.example.com
openshift_master_cluster_public_hostname=external.paas.example.com
而且你最好一步步配置每个组件的证书以便测试。
例如,
首先,Configuring a Custom Master Host Certificate,并验证。
然后,Configuring a Custom Wildcard Certificate for the Default Router,验证。
等等。如果您可以成功完成所有重新部署证书任务,那么您最终可以 运行 为您的商业证书维护提供完整的参数。
参考Configuring Custom Certificates了解更多详情。
希望对你有帮助。
当我安装我们的集群时,我使用了来自我们内部 CA 机构的自签名证书。一切都很好,直到我开始从部署到 OKD 集群的应用程序中收到证书错误。我们决定不再尝试一次修复一个错误,而是简单地购买一个商业证书并安装它。因此,我们从 GlobalSign 购买了带有通配符(与我们最初从内部 CA 获得的通配符相同)的 SAN 证书,我正在尝试安装它,但遇到了很多问题。
请记住,我在这里尝试了数十次迭代。我只是在记录我尝试找出问题所在的最后一个。这是在我的测试集群上,它是一个 VM 服务器,我在每个之后恢复到快照。快照是使用内部 CA 证书的操作集群。
因此,我的第一步是构建要传入的 CA 文件。我下载了 GlobalSign 的根证书和中间证书,并将它们放入 ca-globalsign.crt
文件中。 (PEM 格式)
当我运行
openssl verify -CAfile ../ca-globalsign.crt labtest.mycompany.com.pem
我得到:
labtest.mycompany.com.pem: OK
和 openssl x509 -in labtest.mycompany.com.pem -text -noout
给了我(编辑)
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
(redacted)
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=BE, O=GlobalSign nv-sa, CN=GlobalSign Organization Validation CA - SHA256 - G2
Validity
Not Before: Apr 29 16:11:07 2019 GMT
Not After : Apr 29 16:11:07 2020 GMT
Subject: C=US, ST=(redacted), L=(redacted), OU=Information Technology, O=(redacted), CN=labtest.mycompany.com
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
(redacted)
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
Authority Information Access:
CA Issuers - URI:http://secure.globalsign.com/cacert/gsorganizationvalsha2g2r1.crt
OCSP - URI:http://ocsp2.globalsign.com/gsorganizationvalsha2g2
X509v3 Certificate Policies:
Policy: 1.3.6.1.4.1.4146.1.20
CPS: https://www.globalsign.com/repository/
Policy: 2.23.140.1.2.2
X509v3 Basic Constraints:
CA:FALSE
X509v3 Subject Alternative Name:
DNS:labtest.mycompany.com, DNS:*.labtest.mycompany.com, DNS:*.apps.labtest.mycompany.com
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Key Identifier:
(redacted)
X509v3 Authority Key Identifier:
(redacted)
(redacted)
在我的本地机器上。我所知道的关于 SSL 的一切都表明证书很好。这些新文件放在我用来保存 OKD 安装配置等的项目中。
然后我更新了我的 ansible 清单项目中的证书文件和 运行 命令
ansible-playbook -i ../okd_install/inventory/okd_labtest_inventory.yml playbooks/redeploy-certificates.yml
当我阅读文档时,一切都告诉我它应该简单地通过它的过程并提出新的证书。这不会发生。当我在我的清单文件中使用 openshift_master_overwrite_named_certificates: false
时,安装完成,但它只替换 *.apps.labtest
域上的证书,但 console.labtest
保持原始状态但它确实联机,除了监控在集群控制台中显示 bad gateway
的事实。
现在,如果我再次尝试 运行 命令,使用 openshift_master_overwrite_named_certificates: true
我的 /var/log/containers/master-api*.log
会充满这样的错误
{"log":"I0507 15:53:28.451851 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46796: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.451894391Z"}
{"log":"I0507 15:53:28.455218 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46798: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.455272658Z"}
{"log":"I0507 15:53:28.458742 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46800: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.461070768Z"}
{"log":"I0507 15:53:28.462093 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46802: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.463719816Z"}
还有这些
{"log":"I0507 15:53:29.355463 1 logs.go:49] http: TLS handshake error from 10.70.25.131:44424: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.357218793Z"}
{"log":"I0507 15:53:29.357961 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43128: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.358779155Z"}
{"log":"I0507 15:53:29.357993 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43126: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.358790397Z"}
{"log":"I0507 15:53:29.405532 1 logs.go:49] http: TLS handshake error from 10.70.25.131:44428: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.406873158Z"}
{"log":"I0507 15:53:29.527221 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43130: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53
并且安装在 ansible 任务 TASK [Remove web console pods]
上挂起。它会在那里坐上几个小时。当进入主控制台并在 openshift-web-console
上 运行 oc get pods
时,它处于 terminating
状态。当我描述试图以 pending
开始的 pod 时,它返回说硬盘已满。我假设那是因为它无法与存储系统通信,因为上面所有这些 TLS 错误。它只是留在那里。如果我强制删除终止的 pod,我可以恢复集群,然后重新启动 master,然后删除试图启动的新 pod,然后再次重新启动。然后 Web 控制台上线,但我所有的日志文件都充斥着这些 TLS 错误。但是,更令人担忧的是安装在那个地方挂起,所以我假设在将 Web 控制台联机后还有其他步骤也会导致我出现问题。
因此,我也尝试重新部署服务器 CA。这产生了问题,因为我的新证书不是 CA 证书。然后当我 运行 重新部署 CA 剧本,让集群重新创建服务器 CA 时,它完成得很好,但是当我尝试 运行 redeploy-certificates.yml
时,我得到了相同的结果.
这是我的库存文件
all:
children:
etcd:
hosts:
okdmastertest.labtest.mycompany.com:
masters:
hosts:
okdmastertest.labtest.mycompany.com:
nodes:
hosts:
okdmastertest.labtest.mycompany.com:
openshift_node_group_name: node-config-master-infra
okdnodetest1.labtest.mycompany.com:
openshift_node_group_name: node-config-compute
openshift_schedulable: True
OSEv3:
children:
etcd:
masters:
nodes:
# https://docs.okd.io/latest/install_config/persistent_storage/persistent_storage_glusterfs.html#overview-containerized-glusterfs
# https://github.com/openshift/openshift-ansible/tree/master/playbooks/openshift-glusterfs
# glusterfs:
vars:
openshift_deployment_type: origin
ansible_user: root
openshift_master_cluster_method: native
openshift_master_default_subdomain: apps.labtest.mycompany.com
openshift_install_examples: true
openshift_master_cluster_hostname: console.labtest.mycompany.com
openshift_master_cluster_public_hostname: console.labtest.mycompany.com
openshift_hosted_registry_routehost: registry.apps.labtest.mycompany.com
openshift_certificate_expiry_warning_days: 30
openshift_certificate_expiry_fail_on_warn: false
openshift_master_overwrite_named_certificates: true
openshift_hosted_registry_routetermination: reencrypt
openshift_master_named_certificates:
- certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
names:
- "console.labtest.mycompany.com"
# - "labtest.mycompany.com"
# - "*.labtest.mycompany.com"
# - "*.apps.labtest.mycompany.com"
openshift_hosted_router_certificate:
certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
openshift_hosted_registry_routecertificates:
certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
# LDAP auth
openshift_master_identity_providers:
- name: 'mycompany_ldap_provider'
challenge: true
login: true
kind: LDAPPasswordIdentityProvider
attributes:
id:
- dn
email:
- mail
name:
- cn
preferredUsername:
- sAMAccountName
bindDN: 'ldapbind@int.mycompany.com'
bindPassword: (redacted)
insecure: true
url: 'ldap://dc-pa1.int.mycompany.com/ou=mycompany,dc=int,dc=mycompany,dc=com'
我在这里错过了什么?我认为此 redeploy-certificates.yml
剧本旨在更新证书。为什么我不能将其转换为我的新商业证书?它几乎就像它替换路由器上的证书(有点),但在此过程中搞砸了内部服务器证书。我真的束手无策,我不知道还能尝试什么。
您应该将 openshift_master_cluster_hostname
和 openshift_master_cluster_public_hostname
配置为彼此不同的主机名。
这两个主机名也应该由 DNS 解析。您的商业证书用作外部访问点。
The openshift_master_cluster_public_hostname and openshift_master_cluster_hostname parameters in the Ansible inventory file, by default /etc/ansible/hosts, must be different.
If they are the same, the named certificates will fail and you will need to re-install them.
# Native HA with External LB VIPs
openshift_master_cluster_hostname=internal.paas.example.com
openshift_master_cluster_public_hostname=external.paas.example.com
而且你最好一步步配置每个组件的证书以便测试。 例如, 首先,Configuring a Custom Master Host Certificate,并验证。 然后,Configuring a Custom Wildcard Certificate for the Default Router,验证。 等等。如果您可以成功完成所有重新部署证书任务,那么您最终可以 运行 为您的商业证书维护提供完整的参数。
参考Configuring Custom Certificates了解更多详情。 希望对你有帮助。