Istio Ingress 导致 "no healthy upstream"

Question

我正在使用部署一个面向外的服务，它暴露在一个节点端口后面，然后是一个 istio 入口。部署使用手动 sidecar 注入。一旦部署，nodeport 和 ingress 是运行，我可以向 istio ingress 发出请求。

由于某些未知原因，请求没有路由到我的部署，而是显示文本 "no healthy upstream"。这是为什么，是什么原因造成的？

我可以在 http 响应中看到状态代码为 503（服务不可用）并且服务器为 "envoy"。部署正在运行，因为我可以将端口映射到它并且一切都按预期工作。

Answer 1

尽管这是由于 Istio 设置不当导致的路由问题导致的一般性错误，但我会向遇到相同问题的任何人提供一般性的 solution/piece 建议。

在我的例子中，问题是由于路由规则配置不正确，Kubernetes 本机服务正常运行，但 Istio 路由规则配置不正确，因此 Istio 无法从入口路由到服务。

Answer 2

当我的 pod 处于 ContainerCreating 状态时，我遇到了这个问题。因此，它导致了 503 错误。也如@pegaldon 所解释的那样，它也可能由于路由配置不正确或用户没有创建网关而发生。

Answer 3

删除destinationrules.networking.istio.io 并重新创建 virtualservice.networking.istio.io

[root@10-20-10-110 ~]# curl http://dprovider.example.com:31400/dw/provider/beat
no healthy upstream[root@10-20-10-110 ~]# 
[root@10-20-10-110 ~]# curl http://10.210.11.221:10100/dw/provider/beat
"该服务节点  10.210.11.221  心跳正常!"[root@10-20-10-110 ~]# 
[root@10-20-10-110 ~]# 
[root@10-20-10-110 ~]# cat /home/example_service_yaml/vs/dw-provider-service.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: dw-provider-service
  namespace: example
spec:
  hosts:
  - "dprovider.example.com"
  gateways:
  - example-gateway
  http:
  - route:
    - destination:
        host: dw-provider-service 
        port:
          number: 10100
        subset: "v1-0-0"
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: dw-provider-service
  namespace: example
spec:
  host: dw-provider-service
  subsets:
  - name: "v1-0-0"
    labels:
      version: 1.0.0

[root@10-20-10-110 ~]# vi /home/example_service_yaml/vs/dw-provider-service.yaml 
[root@10-20-10-110 ~]# kubectl -n example get vs -o wide | grep dw                       
dw-collection-service    [example-gateway]   [dw.collection.example.com]                       72d
dw-platform-service      [example-gateway]   [dplatform.example.com]                           81d
dw-provider-service      [example-gateway]   [dprovider.example.com]                           21m
dw-sync-service          [example-gateway]   [dw-sync-service dsync.example.com]               34d
[root@10-20-10-110 ~]# kubectl -n example delete vs dw-provider-service 
virtualservice.networking.istio.io "dw-provider-service" deleted
[root@10-20-10-110 ~]# kubectl -n example delete d dw-provider-service   
daemonsets.apps                       deniers.config.istio.io               deployments.extensions                dogstatsds.config.istio.io            
daemonsets.extensions                 deployments.apps                      destinationrules.networking.istio.io  
[root@10-20-10-110 ~]# kubectl -n example delete destinationrules.networking.istio.io dw-provider-service 
destinationrule.networking.istio.io "dw-provider-service" deleted
[root@10-20-10-110 ~]# kubectl apply -f /home/example_service_yaml/vs/dw-provider-service.yaml 
virtualservice.networking.istio.io/dw-provider-service created
[root@10-20-10-110 ~]# curl http://dprovider.example.com:31400/dw/provider/beat
"该服务节点  10.210.11.221  心跳正常!"[root@10-20-10-110 ~]# 
[root@10-20-10-110 ~]#

Answer 4

以防万一，像我一样，你会好奇...即使在我的场景中，错误的情况很明显...

错误原因： 我有同一个服务的两个版本（v1 和 v2），以及一个使用权重配置流量路由目的地的 Istio VirtualService。然后，95% 进入 v1，5% 进入 v2。由于我还没有部署 v1，当然，错误“503 - 没有健康的上游”显示了 95% 的请求。

好吧，即便如此，我知道问题所在以及如何解决它（只需部署 v1），我想知道...但是，我如何才能获得有关此错误的更多信息？我怎样才能更深入地分析这个错误以找出发生了什么？

这是一种使用 Istio 的配置命令行实用程序进行调查的方法，istioctl：

# 1) Check the proxies status -->
  $ istioctl proxy-status
# Result -->
  NAME                                                   CDS        LDS        EDS        RDS          PILOT                       VERSION
  ...
  teachstore-course-v1-74f965bd84-8lmnf.development      SYNCED     SYNCED     SYNCED     SYNCED       istiod-86798869b8-bqw7c     1.5.0
  ...
  ...

# 2) Get the name outbound from JSON result using the proxy (service with the problem) -->
  $ istioctl proxy-config cluster teachstore-course-v1-74f965bd84-8lmnf.development --fqdn teachstore-student.development.svc.cluster.local -o json
# 2) If you have jq install locally (only what we need, already extracted) -->
  $ istioctl proxy-config cluster teachstore-course-v1-74f965bd84-8lmnf.development --fqdn teachstore-course.development.svc.cluster.local -o json | jq -r .[].name
# Result -->
  outbound|80||teachstore-course.development.svc.cluster.local
  inbound|80|9180-tcp|teachstore-course.development.svc.cluster.local
  outbound|80|v1|teachstore-course.development.svc.cluster.local
  outbound|80|v2|teachstore-course.development.svc.cluster.local

# 3) Check the endpoints of "outbound|80|v2|teachstore-course..." using v1 proxy -->
  $ istioctl proxy-config endpoints teachstore-course-v1-74f965bd84-8lmnf.development --cluster "outbound|80|v2|teachstore-course.development.svc.cluster.local"
# Result (the v2, 5% of the traffic route is ok, there are healthy targets) -->
  ENDPOINT             STATUS      OUTLIER CHECK     CLUSTER
  172.17.0.28:9180     HEALTHY     OK                outbound|80|v2|teachstore-course.development.svc.cluster.local
  172.17.0.29:9180     HEALTHY     OK                outbound|80|v2|teachstore-course.development.svc.cluster.local

# 4) However, for the v1 version "outbound|80|v1|teachstore-course..." -->
$ istioctl proxy-config endpoints teachstore-course-v1-74f965bd84-8lmnf.development --cluster "outbound|80|v1|teachstore-course.development.svc.cluster.local"
  ENDPOINT             STATUS      OUTLIER CHECK     CLUSTER
# Nothing! Emtpy, no Pods, that's explain the "no healthy upstream" 95% of time.

Answer 5

根据我的经验，“上游不健康”错误可能有不同的原因。通常，Istio 已经收到应该转发的入口流量（客户端请求，或者 Istio 下游），但是目的地不可用（istio 上游/kubernetes 服务）。这会导致 HTTP 503“无健康上游”错误。

1.) 损坏的虚拟服务定义 如果您的 VirtualService 上下文中有一个应该路由流量的目的地，请确保该目的地存在（就主机名而言是正确的，或者服务可从该命名空间获得）

2.) ImagePullBack/终止/服务不可用

确保您的目的地普遍可用。有时没有 pod 可用，因此也没有 upstream 可用。

3.) ServiceEntry - 2 个列表中的相同目的地，但具有不同 DNS 规则的列表

检查您的命名空间中的 ServiceEntry 对象：

kubectl -n <namespace> get serviceentry

如果结果有多个条目（一个 ServiceEntry 对象中的多行），请检查目标地址（例如 foo.com）是否在多行中可用。如果相同的目标地址（例如 foo.com）在多行中可用，请确保“DNS”列没有 不同的分辨率 设置（例如一行使用 DNS，另外一行有NONE)。如果是，这表明您尝试将不同的 DNS 设置应用于同一目标地址。

一个解决方案是：

a) 统一DNS设置，将所有线路设置为NONE 或 DNS，但不要混淆。

b) 确保目标 (foo.com) 在一行中可用，并且不会出现不同 DNS 规则的冲突。

a) 涉及重新启动 istio-ingressgateway pods（数据平面）以使其工作。

b) 不涉及重新启动 istio 数据或 istio 控制平面。

基本上：它有助于检查控制平面（istiod）和DatapPlane（istio-ingressgateway）之间的状态

istioctl proxy-status

istioctl proxy-status 的输出应确保列显示“SYNC”，这可确保控制平面和数据平面同步。如果没有，您可以重新启动 istio-ingressgateway 部署或 istiod daemonset，以强制执行“新”进程。

进一步，它有助于运行

istioctl analyze -A

以确保在 VirtualService 上下文中检查目标并且确实存在。如果存在目的地不可用的路由定义的虚拟服务定义，istioctl analyze -A 可以检测到这些不可用的目的地。

此外，阅读istiod容器的日志文件有帮助。 istiod 错误消息通常指示路由中错误的上下文（哪个命名空间和服务或 istio 设置）。您可以使用默认方式

kubectl -n istio-system logs <nameOfIstioDPod>

参考文献：

Istio Ingress 导致 "no healthy upstream"

Istio Ingress resulting in "no healthy upstream"

kubernetes

istio