带有 Service Fabric 的 Traefik——无法连接到 Service Fabric 服务器

Traefik with Service Fabric -- failed to connect to Service Fabric server

我已经使用以下配置在我的 Azure Service Fabric 集群上部署了 Traefik:

# Enable Service Fabric configuration backend
[servicefabric]

# Service Fabric Management Endpoint
clustermanagementurl = "https://localhost:19080"

# Service Fabric Management Endpoint API Version
apiversion = "3.0"

insecureSkipVerify = true

但是,当打开 Traefik 仪表板时,我看到一个空白屏幕,因为它无法映射我所有的 Fabric 应用程序。

查看我的一台虚拟机上的 Traefik 日志,我反复看到此错误:

level=error msg="failed to connect to Service Fabric server Get https://localhost:19080/Applications/?api-version=3.0: x509: certificate is valid for <hidden>.eastus.cloudapp.azure.com, not localhost on https://localhost:19080/Applications/?api-version=3.0"

我的 Azure Service Fabric 群集具有由受信任的 CA 签名的 SSL 证书:

我该如何解决这个问题?


编辑 1:

如果有帮助,这是 Traefik 加载的配置(根据日志):

{
    "LifeCycle": {
        "RequestAcceptGraceTimeout": 0,
        "GraceTimeOut": 0
    },
    "GraceTimeOut": 0,
    "Debug": true,
    "CheckNewVersion": true,
    "AccessLogsFile": "",
    "AccessLog": null,
    "TraefikLogsFile": "",
    "TraefikLog": null,
    "LogLevel": "DEBUG",
    "EntryPoints": {
        "http": {
            "Network": "",
            "Address": ":80",
            "TLS": null,
            "Redirect": null,
            "Auth": null,
            "WhitelistSourceRange": null,
            "Compress": false,
            "ProxyProtocol": null,
            "ForwardedHeaders": {
                "Insecure": true,
                "TrustedIPs": null
            }
        }
    },
    "Cluster": null,
    "Constraints": [],
    "ACME": null,
    "DefaultEntryPoints": [
        "http"
    ],
    "ProvidersThrottleDuration": 2000000000,
    "MaxIdleConnsPerHost": 200,
    "IdleTimeout": 0,
    "InsecureSkipVerify": true,
    "RootCAs": null,
    "Retry": null,
    "HealthCheck": {
        "Interval": 30000000000
    },
    "RespondingTimeouts": null,
    "ForwardingTimeouts": null,
    "Docker": null,
    "File": null,
    "Web": {
        "Address": ":9000",
        "CertFile": "",
        "KeyFile": "",
        "ReadOnly": false,
        "Statistics": null,
        "Metrics": null,
        "Path": "/",
        "Auth": null,
        "Debug": false,
        "CurrentConfigurations": null,
        "Stats": null,
        "StatsRecorder": null
    },
    "Marathon": null,
    "Consul": null,
    "ConsulCatalog": null,
    "Etcd": null,
    "Zookeeper": null,
    "Boltdb": null,
    "Kubernetes": null,
    "Mesos": null,
    "Eureka": null,
    "ECS": null,
    "Rancher": null,
    "DynamoDB": null,
    "ServiceFabric": {
        "Watch": false,
        "Filename": "",
        "Constraints": null,
        "Trace": false,
        "DebugLogGeneratedTemplate": false,
        "ClusterManagementURL": "https://localhost:19080",
        "APIVersion": "3.0",
        "UseCertificateAuth": false,
        "ClientCertFilePath": "",
        "ClientCertKeyFilePath": "",
        "InsecureSkipVerify": true
    }
}

编辑 2:

有人建议使用我的集群的远程地址而不是 localhost,这样做会导致不同的错误:

Provider connection error: failed to connect to Service Fabric server Get https://<hidden>.eastus.cloudapp.azure.com:19080/Applications/?api-version=3.0: stream error: stream ID 1; HTTP_1_1_REQUIRED on https://<hidden>.eastus.cloudapp.azure.com:19080/Applications/?api-version=3.0; retrying in 656.765021ms

要向 ServiceFabric API 进行身份验证,您必须使用证书,在您的配置中您错过了这个细节。

在 Traefik 设置中你应该有这样的东西:

# [serviceFabric.tls]
cert = "certs/servicefabric.crt"
key = "certs/servicefabric.key"
insecureskipverify = true

下面post一步步描述

https://blog.techfabric.io/using-traefik-reverse-proxy-for-securing-microservices-on-azure-service-fabric/

感谢 Diego 的评论(在我的问题下),我通过添加以下内容成功解决了这个问题。

问题是什么?

  1. 我的 SF 集群是安全的,需要客户端证书才能登录 -- Traefik TOML 文件中未指定。 (希望记录的错误信息更丰富)
  2. 查看 Traefik 日志,特别是在 SF 部分(查找以 Starting provider *servicefabric.Provider 开头的跟踪:

    "Watch": false,
    "Filename": "",
    "Constraints": null,
    "Trace": false,
    "DebugLogGeneratedTemplate": false,
    "ClusterManagementURL": "https://localhost:19080",
    "APIVersion": "3.0",
    "UseCertificateAuth": false,      <-------- Important
    "ClientCertFilePath": "",         <-------- Important
    "ClientCertKeyFilePath": "",      <-------- Important
    "InsecureSkipVerify": false
    
    • UseCertificateAuth -- Traefik查询集群管理端点时是否使用客户端证书。
    • ClientCertFilePath -- 包含客户端证书public密钥的文件路径。
    • ClientCertKeyFilePath -- 包含客户端证书私钥的文件路径。

(两条路径都应该相对于traefik.exe


不安全跳过验证

Traefik 的 SF 配置(上图)包括一个名为 InsecureSkipVerify

的设置
  • InsecureSkipVerify -- 如果设置为 false,那么 Traefik 将拒绝与管理端点的连接,除非使用的 SSL 证书是由受信任的 CA 签名的。
  • 如果证书是为远程地址签名的,这可能是个问题,而 Traefik 使用 https://localhost 作为集群的端点——因为 Traefik 会打印类似于这样的错误:

failed to connect to Service Fabric server Get https://localhost:19080/Applications/?api-version=3.0: x509: certificate is valid for .eastus.cloudapp.azure.com, not localhost

要克服这一点,您可以

  • 设置InsecureSkipVerify = true并重新部署
  • 将管理端点设置为远程地址: clustermanagementurl = "https://<hidden>.eastus.cloudapp.azure.com:19080"

再次感谢 Diego 给我的提示,让我理解并分享了上述解释。

我知道这是一个旧的 post 但我们只是 运行 进入了这个确切的情况,这是我看到提到的客户端设置的唯一地方。这是最终似乎对我们有用的提供者部分:

################################################################
# Service Fabric provider
################################################################

# Enable Service Fabric configuration backend
[servicefabric]

# Service Fabric Management Endpoint
clustermanagementurl = "https://localhost:19080"
# Note: use "https://localhost:19080" if you're using a secure cluster

# Service Fabric Management Endpoint API Version
apiversion = "3.0"

# Enable TLS connection.
#
# Optional
#
[serviceFabric.tls]
  cert               = "certs/servicefabric.crt"
  key                = "certs/servicefabric.key"
  insecureskipverify = true

UseCertificateAuth    =  true
ClientCertFilePath    = "certs/traefik.crt"
ClientCertKeyFilePath = "certs/traefik.key"
InsecureSkipVerify    =  true