Kubernetes 的 Sagemaker 证书问题

Sagemaker certificate issue with Kubernetes

我创建了一个 docker 容器,它通过 java sdk 使用 Sagemaker。此容器部署在具有多个副本的 k8s 集群上。

容器正在向 Sagemaker 发出简单请求,以列出我们已经训练和部署的一些模型。但是,我们现在遇到了一些 java 证书的问题。我对 k8s 和证书很陌生,所以如果你能提供一些帮助来解决这个问题,我将不胜感激。

以下是日志在尝试列出端点时的一些痕迹:

org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:394)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:353)
    at com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket(SdkTLSSocketFactory.java:132)
    at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:141)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
    at com.amazonaws.http.conn.$Proxy67.connect(Unknown Source)
    at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:380)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
    at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1236)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1056)
    ... 70 common frames omitted
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:397)
    at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:302)
    at sun.security.validator.Validator.validate(Validator.java:262)
    at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
    at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
    at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
    at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1621)
    ... 97 common frames omitted
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
    at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:141)
    at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:126)
    at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
    at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:392)
    ... 103 common frames omitted 

这很可能与您的管理员添加到您的网络的某些自定义 SSL 证书路径有关。您可能想检查 SSL 根证书,方法是在您的浏览器上打开任何安全网站,然后单击地址栏左侧的安全 link(至少在 chrome 中是这样)。您将看到一个显示证书和认证信息的弹出窗口。转到其 Certificate Path 并查看 ROOT certificate ,如果它是自定义证书,则您需要将其添加到您的 cacerts 文件中。阅读此 link 了解更多详情

我想我已经找到了问题的答案。我已经设置了另一个 k8s 集群并在那里部署了容器。他们工作正常,证书问题不会发生。在进行更多调查时,我注意到第一个 k8s 集群上的 DNS 解析存在一些问题。例如,实际上有证书问题的容器无法 ping google.com。 我通过不依赖 core-dns 并在 deployment.yaml 文件中设置 DNS 配置来解决 DNS 问题。我不太清楚为什么,但这似乎解决了证书问题。

当 Java 不知道 TLS 端点返回的根证书时,会出现您收到的错误消息。如果您更改可用的根证书,通常会发生这种情况。

根据 https://docs.oracle.com/javase/7/docs/technotes/guides/security/jsse/JSSERefGuide.html#Customization:

"If a truststore named <java-home>/lib/security/jssecacerts is found, it is used. 
If not, then a truststore named <java-home>/lib/security/cacerts is searched for and used (if it exists).
Finally, if a truststore is still not found, then the truststore managed by the TrustManager will be a new empty truststore."

Openssl 是调试此类证书问题的好工具。您可以使用以下命令检索端点返回的证书。这可能会帮助您确定证书链的样子。

openssl s_client -showcerts -connect www.example.com:443 </dev/null

您可以查看 Java 知道的关于使用 keytool 的证书列表,这是一个随 JRE 一起出售的实用程序。

keytool -list -cacerts

一些系统管理员会通过将替代信任库文件写入默认位置来覆盖默认证书。其他时候,团队可以使用 javax.net.ssl.trustStore 系统 属性.

覆盖默认值

最后,您可以使用 jps 实用程序(也随 JRE 一起出售)查看 运行 Java 进程上设置的系统属性。

jps -v