如何处理服务网格中的重大变化

How to deal with breaking changes in a Service Mesh

我正在使用 Kubernetes 构建一个示例微服务应用程序,以找出最佳实践和未来项目的一些模式。我正在使用 Istio 作为服务网格来处理 east-west 流量,并且我对这些概念(VirtualServices、DestinationRules 等)有基本的了解。服务网格使我能够轻松推出微服务的新版本并将流量重定向到新实例(例如使用加权分布)。当考虑到语义版本控制时,这对于 PatchMinor 更新非常有效,因为它们在理论上不会改变现有合同,因此可以 drop-in 替代现有服务。现在我想知道如何正确处理服务的重大变化,所以 Major 版本更新。

很难找到这方面的信息,但由于我得到的信息有限,我现在正在考虑两种方法:

  1. 服务的每个主要版本(例如 user-service)都有自己的 VirtualService,以便客户端可以正确地解决它(通过不同的服务名称,例如 user-service-v1).然后使用 Istio 将主要版本(例如 1.*)的流量正确路由到不同的可用服务(例如 user-service v1.3.1user-service v1.4.0)。

  2. 我对一个特定的微服务使用一个整体 VirtualService(例如 user-service)。 VirtualService 包含许多路由定义以供使用,例如客户端发送的 header(例如 x-major-version=1)以将请求与目的地匹配。

总的来说,两种方法之间没有太大区别。客户端显然需要通过设置 header 或通过解析不同的服务名称来指定他想要对话的主要版本。所描述的方法是否存在使一种方法优于另一种方法的任何限制?还是我完全缺少其他选择?非常感谢任何帮助和指点!

TLDR

除了我在评论中提到的,在对主题进行更详细的检查之后,我会选择 方法 2,总体上有一个 虚拟服务 对于具有 金丝雀部署 镜像 .

的特定微服务

方法一

documentation

所述

In situations where it is inconvenient to define the complete set of route rules or policies for a particular host in a single VirtualService or DestinationRule resource, it may be preferable to incrementally specify the configuration for the host in multiple resources. Pilot will merge such destination rules and merge such virtual services if they are bound to a gateway.

所以在 理论 中你可以使用方法 1,但我会说配置太多,并且有更好的想法来做到这一点。

假设您有名称为 v1.3.1 的旧应用和名称为 v1.4.0 的新应用,因此适当的虚拟服务如下所示。

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: vs-vervice1
spec:
  hosts:
  - '*'
  http:
  - name: "v1.3.1"
    route:
    - destination:
        host: service1.namespace.svc.cluster.local

---

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: vs-service2
spec:
  hosts:
  - '*'
  http:
  - name: "v1.4.0"
    route:
    - destination:
        host: service2.namespace.svc.cluster.local

方法二

实践中,我会采用方法 2,例如,您可以创建应用程序的 2 个版本,在下面的示例中是 oldnew 然后 为其配置虚拟服务和目标规则。

这里的问题是,为什么?因为它更容易管理,至少对我来说是这样,而且在这里使用金丝雀部署和镜像也很容易,下面会详细介绍。

假设您部署了新应用,您不想在这里发送 1% 的传入流量,另外您可以使用镜像,因此每个发往旧服务的请求都将镜像到新服务以进行测试。

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: vs-vervice
spec:
  hosts:
  - '*'
  http:
  - name: "old"
    route:
    - destination:
        host: service.namespace.svc.cluster.local
        subset: v1
      weight: 99
    mirror:
      host: service.namespace.svc.cluster.local
      subset: v2
    mirror_percent: 100
  - name: "new"
    route:
    - destination:
        host: service.namespace.svc.cluster.local
        subset: v2
      weight: 1

---


apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: reviews-destination
spec:
  host: service.namespace.svc.cluster.local
  subsets:
  - name: v1
    labels:
      version: v1  <--- label on old pod
  - name: v2
    labels:
      version: v2  <--- label on new pod

正在测试新应用程序

The client obviously needs to specify to which major version he wants to talk, either by setting a header or by resolving a different service name.

实际上这取决于配置,如果您将上述选项与 newold 版本一起使用,那么这就是金丝雀部署,例如加权分布,用于。您可以指定应发送到应用程序新版本的流量百分比。当然,您可以在虚拟服务中指定 headers 或前缀,以便用户可以使用较旧或较新版本的应用程序。

金丝雀部署

如前所述here

One of the benefits of the Istio project is that it provides the control needed to deploy canary services. The idea behind canary deployment (or rollout) is to introduce a new version of a service by first testing it using a small percentage of user traffic, and then if all goes well, increase, possibly gradually in increments, the percentage while simultaneously phasing out the old version. If anything goes wrong along the way, we abort and rollback to the previous version. In its simplest form, the traffic sent to the canary version is a randomly selected percentage of requests, but in more sophisticated schemes it can be based on the region, user, or other properties of the request.

Depending on your level of expertise in this area, you may wonder why Istio’s support for canary deployment is even needed, given that platforms like Kubernetes already provide a way to do version rollout and canary deployment. Problem solved, right? Well, not exactly. Although doing a rollout this way works in simple cases, it’s very limited, especially in large scale cloud environments receiving lots of (and especially varying amounts of) traffic, where autoscaling is needed.

istio

With Istio, traffic routing and replica deployment are two completely independent functions. The number of pods implementing services are free to scale up and down based on traffic load, completely orthogonal to the control of version traffic routing. This makes managing a canary version in the presence of autoscaling a much simpler problem. Autoscalers may, in fact, respond to load variations resulting from traffic routing changes, but they are nevertheless functioning independently and no differently than when loads change for other reasons.

Istio’s routing rules also provide other important advantages; you can easily control fine-grained traffic percentages (e.g., route 1% of traffic without requiring 100 pods) and you can control traffic using other criteria (e.g., route traffic for specific users to the canary version). To illustrate, let’s look at deploying the helloworld service and see how simple the problem becomes.

有个example.

镜像

第二个经常用来测试新版本应用程序的东西是流量镜像。

如前所述here

Using Istio, you can use traffic mirroring to duplicate traffic to another service. You can incorporate a traffic mirroring rule as part of a canary deployment pipeline, allowing you to analyze a service's behavior before sending live traffic to it.

如果您正在寻找最佳实践,我建议您从中等 tutorial 开始,因为这里解释得很好。

流量镜像的工作原理

Traffic mirroring works using the steps below:

  • You deploy a new version of the application and switch on traffic mirroring.

  • The old version responds to requests like before but also sends an asynchronous copy to the new version.

  • The new version processes the traffic but does not respond to the user.

  • The operations team monitor the new version and report any issues to the development team.

As the application processes live traffic, it helps the team uncover issues that they would typically not find in a pre-production environment. You can use monitoring tools, such as Prometheus and Grafana, for recording and monitoring your test results.

此外,还有一个 nginx 示例完美地展示了它应该如何工作。

值得一提的是,如果你使用写API,比如下单或者支付,那么镜像流量就是多次写下单这样的API。 Christian Posta here 详细描述了该主题。


如果你还有什么想讨论的,请告诉我。