ZMQ 套接字在 Kubernetes 上无法正常工作

ZMQ sockets do not work as expected on Kubernetes

简短摘要:当我在 kubernetes 上部署代码时,我的 ZMQ 套接字不接收(可能也发送)消息。

我有一个涉及多个客户端和服务器的应用程序。它在node上开发,使用ZeroMQ作为通信层。它在我的本地机器上工作,它在 docker 上工作,我正在尝试使用 kubernetes 部署应用程序。

部署应用程序时,pods、部署和 kubernete 的服务启动。显然一切正常,但客户端发送的初始消息永远不会到达服务器。部署在同一个命名空间中,我使用 Flannel 作为 CNI。据我所知,集群已正确初始化,但消息从未到达。

我读到这篇 关于 ZMQ 套接字在 kubernetes 上绑定问题的文章。我试过使用 ZMQ_CONNECT_TIMEOUT 参数,但它没有做任何事情。另外,与我引用的问题不同,我的消息永远不会到达。

我可以提供一些代码,但代码很多,我认为应用程序没有问题。我想我在 Kubernetes 配置上遗漏了一些东西,因为这是我第一次使用它。如果您需要更多信息,请告诉我。

编辑 1。12/01/2021

正如@anemyte 建议的那样,我将尝试提供代码的简化版本:

客户端:

initiate () {
        return new Promise(resolve => {
            this.N_INCOMING = 0;
            this.N_OUTGOING = 0;
            this.rrCounter = 0;
            this.PULL_SOCKET.bind("tcp://*:" + this.MY_PORT, error => {
                utils.handleError(error);
                this.PUB_SOCKET.bind("tcp://*:" + (this.MY_PORT + 1), error => {
                    utils.handleError(error);
                    this.SUB_SOCKET.subscribe("");                    
                    this.SUB_SOCKET.connect(this.SERVER + ":" + (this.SERVER_PORT + 1),
                        error => {utils.handleError(error)});
                    this.PULL_SOCKET.on("message", (m) => this.handlePullSocket(m));
                    this.SUB_SOCKET.on("message", (m) => this.handleSubSocket(m));
                    this.SERVER_PUSH_SOCKET = zmq.socket("push");
                    this.SERVER_PUSH_SOCKET.connect(this.SERVER + ":" + this.SERVER_PORT,
                        error => {utils.handleError(error)});
                    this.sendHello();
                    resolve();
                });
            });
        });

服务器端:

    initiate () {
        return new Promise(resolve => {
            this.PULL_SOCKET.bind(this.MY_IP + ":" + this.MY_PORT, err => {
                if (err) {
                    console.log(err);
                    process.exit(0);
                }
                this.PUB_SOCKET.bind(this.MY_IP + ":" + (this.MY_PORT + 1), err => {
                    if (err) {
                        console.log(err);
                        process.exit(0);
                    }
                    this.PULL_SOCKET.on("message", (m) => this.handlePullSocket(m));
                    resolve();
                });
            });
        });
    }

客户端通过发送问候消息发起连接。服务器的侦听器函数 handlePullSocket 应该处理这些消息。

编辑 2。12/01/2021 根据要求,我正在添加 deployment/service 配置。

客户-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    kompose.cmd: kompose convert -f docker-compose-resolved.yml
    kompose.version: 1.19.0 (f63a961c)
  creationTimestamp: null
  labels:
    io.kompose.service: c1
  name: c1
  namespace: fishtrace
spec:
  replicas: 1
  selector:
    matchLabels:
      app: c1
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        kompose.cmd: kompose convert -f docker-compose-resolved.yml
        kompose.version: 1.19.0 (f63a961c)
      creationTimestamp: null
      labels:
        app: c1
    spec:
      containers:
      - env:
        - name: NODO_ADDRESS
          value: 0.0.0.0
        - name: NODO_PUERTO
          value: "9999"
        - name: NODO_PUERTO_CADENA
          value: "8888"
        - name: SERVER_ADDRESS
          value: tcp://servidor
        - name: SERVER_PUERTO
          value: "7777"
        image: registrogeminis.com/docker_c1_rpi:latest
        name: c1
        ports:
          - containerPort: 8888
          - containerPort: 9999
        resources: {}
        volumeMounts:
        - mountPath: /app/vol
          name: c1-volume
        imagePullPolicy: Always
      restartPolicy: Always
      imagePullSecrets:
        - name: myregistrykey
      volumes:
        - name: c1-volume
          persistentVolumeClaim:
            claimName: c1-volume
status: {}

客户-service.yaml

apiVersion: v1
kind: Service
metadata:
  annotations:
    kompose.cmd: ./kompose convert
    kompose.version: 1.22.0 (955b78124)
  creationTimestamp: null
  labels:
    io.kompose.service: c1
  name: c1
spec:
  ports:
    - name: "9999"
      port: 9999
      targetPort: 9999
    - name: "8888"
      port: 8888
      targetPort: 8888
  selector:
    app: c1
  type: ClusterIP

服务器-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    kompose.cmd: kompose convert -f docker-compose-resolved.yml
    kompose.version: 1.19.0 (f63a961c)
  creationTimestamp: null
  labels:
    io.kompose.service: servidor
  name: servidor
  namespace: fishtrace
spec:
  replicas: 1
  selector:
    matchLabels:
      app: servidor
  strategy:
    type: Recreate
  template:
    metadata:
      annotations:
        kompose.cmd: kompose convert -f docker-compose-resolved.yml
        kompose.version: 1.19.0 (f63a961c)
      creationTimestamp: null
      labels:
        app: servidor
    spec:
      containers:
      - env:
        - name: SERVER_ADDRESS
          value: tcp://*
        - name: SERVER_PUERTO
          value: "7777"
        image: registrogeminis.com/docker_servidor_rpi:latest
        name: servidor
        ports:
          - containerPort: 7777
          - containerPort: 7778
        resources: {}
        volumeMounts:
        - mountPath: /app/vol
          name: servidor-volume
        imagePullPolicy: Always
      restartPolicy: Always
      imagePullSecrets:
        - name: myregistrykey
      volumes:
      - name: servidor-volume
        persistentVolumeClaim:
          claimName: servidor-volume
status: {}

服务器-service.yaml

apiVersion: v1
kind: Service
metadata:
  annotations:
    kompose.cmd: ./kompose convert
    kompose.version: 1.22.0 (955b78124)
  creationTimestamp: null
  labels:
    io.kompose.service: servidor
  name: servidor
spec:
  ports:
    - name: "7777"
      port: 7777
      targetPort: 7777
  selector:
    app: servidor
  type: ClusterIP

最后是DNS问题。这始终是 DNS 问题。感谢@Matt 指出问题。

在Kubernetes官方中DNS doc they state that there is a known issue with systems that use /etc/resolv.conf as a link to the real configuration file, /run/systemd/resolve/resolv.conf in my case. It is a well-known problem, and the recommended solution是更新kubelet的配置指向/run/systemd/resolve/resolv.conf

为此,我在 /var/lib/kubelet/config.yaml 中添加了行 resolvConf:/run/systemd/resolve/resolv.conf。为了确定,我还编辑了 /etc/kubernetes/kubelet.conf。最后,您应该重新加载执行 sudo systemctl daemon-reload && sudo systemctl restart kubelet 的服务以传播更改。

但是,我在 SE 中询问之前已经这样做了。它似乎没有用。 我不得不重新启动整个集群 以使更改生效。然后 DNS 完美运行,ZMQ 套接字按预期运行。

更新 31/04/2021:我发现您必须强制重启 coredns kubernetes 的服务才能实际传播更改。所以最后kubectl rollout restart coredns -n kube-system重启kubelet服务后就可以了