证书管理器质询因连接超时而失败。可以从互联网手动连接到 URL。下一步调试是什么?

cert-manager challenge failed with connection timeout. Can connect manually to URL from internet. What is next debugging step?

我是 Kubernetes 的新手,不支持在 Kubernetes 中托管的特定网站。我想弄清楚为什么 cert-manager 几周前没有在 QA 环境中更新证书。

查看各种证书相关资源的详细信息,问题好像是挑战失败:

State: invalid, Reason: Error accepting authorization: acme: authorization error for [DOMAIN]: 400 urn:ietf:params:acme:error:connection: Fetching http://[DOMAIN]/.well-known/acme-challenge/[CHALLENGE TOKEN STRING]: Timeout during connect (likely firewall problem)

我认为该错误意味着 Let's Encrypt 无法访问位于 http://[DOMAIN]/.well-known/acme-challenge/[CHALLENGE TOKEN STRING]

的挑战文件

(已编辑域和挑战令牌字符串)

我已经尝试通过 PowerShell 连接到 URL:

PS C:\Users\Simon> invoke-webrequest -uri http://[DOMAIN]/.well-known/acme-challenge/[CHALLENGE TOKEN STRING] -SkipCertificateCheck

它 returns 200 OK。

但是,PowerShell 会自动遵循重定向并使用 WireShark 检查 Nginx Web 服务器是否正在执行 308 永久重定向到 https://[DOMAIN]/.well-known/acme-challenge/[CHALLENGE TOKEN STRING]

(相同 URL 但只是将 HTTP 重定向到 HTTPS)

我知道 Let's Encrypt 应该能够处理 HTTP 到 HTTPS 的重定向。

鉴于 URL Let's Encrypt 试图访问的内容可以从互联网访问,我不知道下一步应该做什么来调查这个问题。谁能给点建议?

这是 kubectl cert-manager 插件的完整输出,检查证书和相关资源的状态:

PS C:\Users\Simon> kubectl cert-manager status certificate -n qa containers-tls-secret

Name: containers-tls-secret
Namespace: qa
Created at: 2020-10-16T08:40:14+13:00
Conditions:
  Ready: False, Reason: Expired, Message: Certificate expired on Sun, 14 Mar 2021 17:41:12 UTC
  Issuing: False, Reason: Failed, Message: The certificate request has failed to complete and will be retried: Failed to wait for order resource "containers-tls-secret-q2cwr-3223066309" to become ready: order is in "invalid" state:
DNS Names:
- [DOMAIN]
Events:
  Type     Reason   Age                 From          Message
  ----     ------   ----                ----          -------
  Normal   Issuing  31s (x236 over 9d)  cert-manager  Renewing certificate as renewal was scheduled at 2021-02-12 17:41:12 +0000 UTC
  Normal   Reused   31s (x236 over 9d)  cert-manager  Reusing private key stored in existing Secret resource "containers-tls-secret"
  Warning  Failed   31s (x236 over 9d)  cert-manager  The certificate request has failed to complete and will be retried: Failed to wait for order resource "containers-tls-secret-q2cwr-3223066309" to become ready: order is in "invalid" state:
Issuer:
  Name: letsencrypt
  Kind: ClusterIssuer
  Conditions:
    Ready: True, Reason: ACMEAccountRegistered, Message: The ACME account was registered with the ACME server
  Events:  <none>
Secret:
  Name: containers-tls-secret
  Issuer Country: US
  Issuer Organisation: Let's Encrypt
  Issuer Common Name: R3
  Key Usage: Digital Signature, Key Encipherment
  Extended Key Usages: Server Authentication, Client Authentication
  Public Key Algorithm: RSA
  Signature Algorithm: SHA256-RSA
  Subject Key ID: dadf29869b58d05e980c390fdc8783f52369228d
  Authority Key ID: 142eb317b75856cbae500940e61faf9d8b14c2c6
  Serial Number: 04f7356add94a7909afab94f0847a3457765
  Events:  <none>
Not Before: 2020-12-15T06:41:12+13:00
Not After: 2021-03-15T06:41:12+13:00
Renewal Time: 2021-02-13T06:41:12+13:00
CertificateRequest:
  Name: containers-tls-secret-q2cwr
  Namespace: qa
  Conditions:
    Ready: False, Reason: Failed, Message: Failed to wait for order resource "containers-tls-secret-q2cwr-3223066309" to become ready: order is in "invalid" state:
  Events:  <none>
Order:
  Name: containers-tls-secret-q2cwr-3223066309
  State: invalid, Reason:
  Authorizations:
    URL: https://acme-v02.api.letsencrypt.org/acme/authz-v3/10810339315, Identifier: [DOMAIN], Initial State: pending, Wildcard: false
  FailureTime: 2021-02-13T06:41:59+13:00
Challenges:
- Name: containers-tls-secret-q2cwr-3223066309-2302286353, Type: HTTP-01, Token: [CHALLENGE TOKEN STRING], Key: [CHALLENGE TOKEN STRING].8b00cc-ysOWGQ8vtmpOJobWOFa2cEQUe4Sun5NUKCws, State: invalid, Reason: Error accepting authorization: acme: authorization error for [DOMAIN]: 400 urn:ietf:params:acme:error:connection: Fetching http://[DOMAIN]/.well-known/acme-challenge/[CHALLENGE TOKEN STRING]: Timeout during connect (likely firewall problem), Processing: false, Presented: false

顺便说一下,invoke-webrequest 结果显示返回了一个 HTML 页面:

<!doctype html><html lang="en"><head><meta charset="utf-8"><title>Containers</title><base href="./"><meta name="viewport" content="width=device-width,initial-scale=1"><link rel="icon" href="favicon.ico…

会不会是这个问题?我不知道 Let's Encrypt 希望在 HTTP01 挑战的 URL 中找到什么。网页是允许的还是期待不同的东西?

编辑: 我现在怀疑 invoke-webrequest 返回的 HTML 页面不正常,因为我知道该文件应该包含 Let's Encrypt 令牌和密钥.这是完整的 HTML 页面:

<!doctype html>
<html lang="en">
    <head>
        <meta charset="utf-8">
        <title>Wineworks</title>
        <base href="./">
        <meta name="viewport" content="width=device-width,initial-scale=1">
        <link rel="icon" href="favicon.ico">
        <link rel="apple-touch-icon-precomposed" href="favicon-152.png">
        <meta name="msapplication-TileColor" content="#FFFFFF">
        <meta name="msapplication-TileImage" content="favicon-152.png">
        <script src="https://secure.aadcdn.microsoftonline-p.com/lib/1.0.16/js/adal.min.js"/>
        <link href="styles.025a840d59ecfcfe427e.bundle.css" rel="stylesheet"/>
    </head>
    <body>
        <app-root/>
        <script type="text/javascript" src="inline.ce954cfcbe723b5986e6.bundle.js"/>
        <script type="text/javascript" src="polyfills.7edc676f7558876c179d.bundle.js"/>
        <script type="text/javascript" src="main.da3590aac44ee76e7b3a.bundle.js"/>
    </body>
</html>

知道什么可能导致证书管理器在质询位置放置错误类型的文件吗?

最终无法确定证书更新失败的原因。但是,其中一个证书相关资源上的事件表明之前的续订有效。所以我认为无论问题是什么,都可能是暂时的或一次性的,再次尝试更新证书可能会奏效。

阅读各种文章和博客文章后发现,删除 CertificateRequest 对象会提示证书管理器创建一个新对象,这应该会导致证书续订。此外,删除 CertificateRequest 对象也会自动删除关联的 ACME Order 和 Challenge 对象,因此无需手动删除它们。

删除 CertificateRequest 对象确实有效:证书已成功更新。但是,它并没有立即更新。进一步阅读表明证书续订可能需要一个小时(我没有检查确切的时间所以无法验证)。

要删除 CertificateRequest:

kubectl delete certificaterequest <certificateRequest name>

例如:

kubectl delete certificaterequest my-certificate-zrt6p -n qa

如果您希望在删除 CertificateRequest 对象和 cert-manager 创建一个新对象后立即强制续订,而不是等待一个小时 运行 以下 kubectl 命令,如果您有 kubectl cert-manager 插件 安装:

kubectl cert-manager renew <certificate name>

例如,更新命名空间 qa 中的证书 my-certificate:

kubectl cert-manager renew my-certificate -n qa

注意: 安装 kubectl cert-manager 插件的最简单方法是通过 Krew 插件管理器:

kubectl krew install cert-manager

有关如何安装 Krew 的详细信息,请参阅 https://krew.sigs.k8s.io/docs/user-guide/setup/install/(这对所有 kubectl 插件都很有用,而不仅仅是 cert-manager)。

我通过研究发现的另一件事是,有时旧的证书秘密可能会“卡住”,从而阻止创建新的秘密。您可以删除证书机密以避免此问题。例如:

kubectl delete secret my-certificate -n qa

但是,我假设如果没有证书秘密,您的网站将没有证书,这可能会阻止浏览器访问它。所以我只会删除现有的秘密作为最后的手段。

说不定对以后的人有帮助。我对上述问题的解决方案是误导性通配符 * DNS ipv6 记录。让 letsencrypt 正在检查 ipv4 和 ipv6 记录。

因此解决方案是删除 ipv6 记录。