当 Pod 未就绪时,Readiness Probe 不允许访问内部 kubernetes 服务

Readiness Probe does not allow access to an internal kubernetes service while pod is not ready

Readiness Probe 使应用程序处于未就绪状态。处于此状态时,应用程序无法连接到任何 kubernetes 服务。

我正在为我的 kubernetes 集群的主节点和节点使用 Ubuntu 18。 (我在集群中只使用master时还是出现了这个问题,所以我认为这不是master节点的问题)。

我使用 Spring 应用程序设置了我的 kubernetes 集群,该应用程序使用 hazelcast 来管理缓存。因此,在使用就绪探测器时,应用程序无法访问我创建的 kubernetes 服务,以便使用 hazelcast-kubernetes 插件通过 hazelcast 连接应用程序。

当我取出 readiness-probe 时,应用程序会尽快连接到成功创建 hazelcast 集群的服务,并且一切正常。

就绪探测器将连接到 rest api,它的唯一响应是 200 代码。然而,当应用程序启动时,在进程中间它会启动 hazelcast 集群,因此,它会尝试连接到 kubernetes hazelcast 服务,该服务将应用程序的缓存与其他 pods 连接起来,同时就绪探测器尚未清除,并且 pod 由于探测器处于非就绪状态。这是当应用程序无法连接到 kubernetes 服务时,它会由于我添加的配置而失败或卡住。

service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: my-app-cluster-hazelcast
spec:
  selector:
    app: my-app
  ports:
  - name: hazelcast
    port: 5701

deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-deployment
  labels:
    app: my-app-deployment
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 2
  replicas: 2
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      terminationGracePeriodSeconds: 180
      containers:
      - name: my-app
        image: my-repo:5000/my-app-container
        imagePullPolicy: Always
        ports:
        - containerPort: 5701
        - containerPort: 9080
        readinessProbe:
          httpGet:
            path: /app/api/excluded/sample
            port: 9080
          initialDelaySeconds: 120
          periodSeconds: 15
        securityContext:
          capabilities:
            add:
              - SYS_ADMIN
        env:
          - name: container
            value: docker

hazelcast.xml:

<?xml version="1.0" encoding="UTF-8"?>

<hazelcast
        xsi:schemaLocation="http://www.hazelcast.com/schema/config http://www.hazelcast.com/schema/config/hazelcast-config-3.11.xsd"
        xmlns="http://www.hazelcast.com/schema/config"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <properties>
        <property name="hazelcast.jmx">false</property>
        <property name="hazelcast.logging.type">slf4j</property>
    </properties>

    <network>
        <port auto-increment="false">5701</port>
            <outbound-ports>
                <ports>49000,49001,49002,49003</ports>
            </outbound-ports>
        <join>
            <multicast enabled="false"/>
            <kubernetes enabled="true">
                <namespace>default</namespace>
                <service-name>my-app-cluster-hazelcast</service-name>
            </kubernetes>
        </join>
    </network>
</hazelcast>

hazelcast-client.xml:

<?xml version="1.0" encoding="UTF-8"?>
<hazelcast-client
        xsi:schemaLocation="http://www.hazelcast.com/schema/client-config http://www.hazelcast.com/schema/client-config/hazelcast-client-config-3.11.xsd"
        xmlns="http://www.hazelcast.com/schema/client-config"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <properties>
        <property name="hazelcast.logging.type">slf4j</property>
    </properties>

    <connection-strategy async-start="false" reconnect-mode="ON">
        <connection-retry enabled="true">
            <initial-backoff-millis>1000</initial-backoff-millis>
            <max-backoff-millis>60000</max-backoff-millis>
        </connection-retry>
    </connection-strategy>

    <network>
        <kubernetes enabled="true">
            <namespace>default</namespace>
            <service-name>my-app-cluster-hazelcast</service-name>
        </kubernetes>
    </network>
</hazelcast-client>

预期结果:

该服务能够连接到 pods,在其描述中创建端点。

$ kubectl 描述服务 my-app-cluster-hazelcast

Name:              my-app-cluster-hazelcast
Namespace:         default
Labels:            <none>
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"my-app-cluster-hazelcast","namespace":"default"},"spec":{"ports...
Selector:          app=my-app
Type:              ClusterIP
IP:                10.244.28.132
Port:              hazelcast  5701/TCP
TargetPort:        5701/TCP
Endpoints:         10.244.4.10:5701,10.244.4.9:5701
Session Affinity:  None
Events:            <none>

应用程序正常运行并在其 hazelcast 集群中显示两个成员并且部署显示为就绪,可以完全访问应用程序:

日志:

2019-08-26 23:07:36,614 TRACE [hz._hzInstance_1_dev.InvocationMonitorThread] (com.hazelcast.spi.impl.operationservice.impl.InvocationMonitor): [10.244.4.10]:5701 [dev] [3.11] Broadcasting operation control packets to: 2 members

$ kubectl 获取部署

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
my-app-deployment   2/2     2            2           2m27s

实际结果:

该服务未获得任何端点。

$ kubectl 描述服务 my-app-cluster-hazelcast

Name:              my-app-cluster-hazelcast
Namespace:         default
Labels:            <none>
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"my-app-cluster-hazelcast","namespace":"default"},"spec":{"ports...
Selector:          app=my-app
Type:              ClusterIP
IP:                10.244.28.132
Port:              hazelcast  5701/TCP
TargetPort:        5701/TCP
Endpoints:
Session Affinity:  None
Events:            <none>

应用程序被 hazelcast-client.xml 中启用的连接策略卡住了,日志如下,使它自己的集群永远没有通信,部署永远处于非就绪状态:

日志:

22:54:11.236 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 57686 ms later, attempt 52 , cap retrytimeout millis 60000
22:55:02.036 [hz._hzInstance_1_dev.cached.thread-4] DEBUG com.hazelcast.internal.cluster.impl.MembershipManager - [10.244.4.8]:5701 [dev] [3.11] Sending member list to the non-master nodes:

Members {size:1, ver:1} [
        Member [10.244.4.8]:5701 - 6a4c7184-8003-4d24-8023-6087d68e9709 this
]

22:55:08.968 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 51173 ms later, attempt 53 , cap retrytimeout millis 60000
22:56:00.184 [hz.client_0.cluster-] WARN com.hazelcast.client.connection.ClientConnectionManager - hz.client_0 [dev] [3.11] Unable to get alive cluster connection, try in 55583 ms later, attempt 54 , cap retrytimeout millis 60000

$ kubectl 获取部署

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
my-app-deployment   0/2     2            0           45m

在你的服务 yaml 中你有

spec:
  selector:
    app: my-app

但在部署 yaml 中,标签值不同

metadata:
  name: my-app-deployment
  labels:
    app: my-app-deployment

有什么原因吗?

澄清一下:

OP with reference to readiness probe所述:

The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers