在 kubeadm 集群中升级 calico-node 问题

Question

我要去upgrade Calico node and cni as per this link for "Upgrading Components Individually"

方向很清楚（我会把每个节点都封锁起来，做calico/cni和calico/node的步骤），但我不太清楚

是什么意思

Update the image in your process management to reference the new version

升级 calico/node 容器。

除此之外，我看不出关于指示的其他问题。我们的环境是k8s kubeadm集群。

我想真正的问题是：我在哪里告诉 k8s 使用更新版本的 calico/node 图像？

编辑

回答以上问题：

我刚刚对 calico.yaml 和 rbac-kdd.yaml 都做了 kubectl delete -f，然后对这些文件的最新版本做了 kubectl create -f。

现在一切似乎都是 3.3.2 版，但我现在在所有 calico-node 上都收到此错误 pods:

Warning Unhealthy 84s (x181 over 31m) kubelet, thalia4 Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with <node IP addresses here

我运行calicoctl nodd status得到了

Calico process is running.

IPv4 BGP status
+---------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |              INFO              |
+---------------+-------------------+-------+----------+--------------------------------+
| 134.x.x.163 | node-to-node mesh | start | 02:36:29 | Connect                        |
| 134.x.x.164 | node-to-node mesh | start | 02:36:29 | Connect                        |
| 134.x.x.165 | node-to-node mesh | start | 02:36:29 | Connect                        |
| 134.x.x.168 | node-to-node mesh | start | 02:36:29 | Active Socket: Host is         |
|             |                   |       |          | unreachable                    |
+---------------+-------------------+-------+----------+--------------------------------+

IPv6 BGP status
No IPv6 peers found.

我假设 134.x.x.168 无法访问是我收到上述健康检查警告的原因。

虽然不确定该怎么做。这个节点在k8s集群中可用（这是节点thalia4）：

[gms@thalia0 calico]$ kubectl get nodes
NAME                  STATUS   ROLES    AGE   VERSION
thalia0               Ready    master   87d   v1.13.1
thalia1               Ready    <none>   48d   v1.13.1
thalia2               Ready    <none>   30d   v1.13.1
thalia3               Ready    <none>   87d   v1.13.1
thalia4               Ready    <none>   48d   v1.13.1

编辑 2

calicoctl node status 在 thalia4 上给了

[sudo] password for gms:
Calico process is running.

IPv4 BGP status
+---------------+-------------------+-------+----------+---------+
| PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |  INFO   |
+---------------+-------------------+-------+----------+---------+
| 134.xx.xx.162 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.xx.xx.163 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.xx.xx.164 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.xx.xx.165 | node-to-node mesh | start | 02:36:29 | Connect |
+---------------+-------------------+-------+----------+---------+

而kubectl describe node thalia4给了

Name:               thalia4.domain
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    dns=dns4
                    kubernetes.io/hostname=thalia4
                    node_name=thalia4
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 134.xx.xx.168/26
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 03 Dec 2018 14:17:07 -0600
Taints:             <none>
Unschedulable:      false
Conditions:
  Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------    -----------------                 ------------------                ------                       -------
  OutOfDisk        Unknown   Fri, 21 Dec 2018 11:58:38 -0600   Sat, 12 Jan 2019 16:44:10 -0600   NodeStatusUnknown            Kubelet stopped posting node status.
  MemoryPressure   False     Mon, 21 Jan 2019 20:54:38 -0600   Sat, 12 Jan 2019 16:50:18 -0600   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False     Mon, 21 Jan 2019 20:54:38 -0600   Sat, 12 Jan 2019 16:50:18 -0600   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False     Mon, 21 Jan 2019 20:54:38 -0600   Sat, 12 Jan 2019 16:50:18 -0600   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True      Mon, 21 Jan 2019 20:54:38 -0600   Sun, 20 Jan 2019 20:27:10 -0600   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  134.xx.xx.168
  Hostname:    thalia4
Capacity:
 cpu:                4
 ephemeral-storage:  6878Mi
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             8009268Ki
 pods:               110
Allocatable:
 cpu:                4
 ephemeral-storage:  6490895145
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             7906868Ki
 pods:               110
System Info:
 Machine ID:                 c011569a40b740a88a672a5cc526b3ba
 System UUID:                42093037-F27E-CA90-01E1-3B253813B904
 Boot ID:                    ffa5170e-da2b-4c09-bd8a-032ce9fca2ee
 Kernel Version:             3.10.0-957.1.3.el7.x86_64
 OS Image:                   Red Hat Enterprise Linux
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://1.13.1
 Kubelet Version:            v1.13.1
 Kube-Proxy Version:         v1.13.1
PodCIDR:                     192.168.4.0/24
Non-terminated Pods:         (3 in total)
  Namespace                  Name                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                        ------------  ----------  ---------------  -------------  ---
  kube-system                calico-node-8xqbs           250m (6%)     0 (0%)      0 (0%)           0 (0%)         24h
  kube-system                coredns-786f4c87c8-sbks2    100m (2%)     0 (0%)      70Mi (0%)        170Mi (2%)     47h
  kube-system                kube-proxy-zp4fk            0 (0%)        0 (0%)      0 (0%)           0 (0%)         31d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                350m (8%)  0 (0%)
  memory             70Mi (0%)  170Mi (2%)
  ephemeral-storage  0 (0%)     0 (0%)
Events:              <none>

我认为这是防火墙问题，但我在 Slack 频道上被告知 "If you're not using host endpoints then we don't mess with your host's connectivity. It sounds like you've got something blocking port 179 on that host."

不确定那会在哪里？ iptables 规则在所有节点上看起来都一样。

Answer 1

--network-plugin=cni 指定我们将 cni 网络插件与位于 --cni-bin-dir（默认 /opt/cni/bin）中的实际 CNI 插件二进制文件和位于 -- 中的 CNI 插件配置一起使用cni-conf-dir (默认 /etc/cni/net.d).

例如

--网络插件=cni

--cni-bin-dir=/opt/cni/bin #可能有多个cni bin，比如calico/weave...，可以用command '/opt/cni/bin/calico -v ' 显示印花布版本

--cni-conf-dir=/etc/cni/net.d #定义详细的cni插件配置，如下：

{
  "name": "calico-network",
  "cniVersion": "0.3.1",
  "plugins": [
    {
      "type": "calico",
      "mtu": 8950,
      "policy": {
        "type": "k8s"
      },
      "ipam": {
        "type": "calico-ipam",
        "assign_ipv6": "false",
        "assign_ipv4": "true"
      },
      "etcd_endpoints": "https://172.16.1.5:2379,https://172.16.1.9:2379,https://172.16.1.15:2379",
      "etcd_key_file": "/etc/etcd/ssl/etcd-client-key.pem",
      "etcd_cert_file": "/etc/etcd/ssl/etcd-client.pem",
      "etcd_ca_cert_file": "/etc/etcd/ssl/ca.pem",
      "kubernetes": {
        "kubeconfig": "/etc/kubernetes/cluster-admin.kubeconfig"
      }
    }
  ]
}

Answer 2

我想通了这个问题。我必须在 iptables 中为所有节点上的 cali-failsafe-in 链添加一个显式规则作为 sudo iptables -A cali-failsafe-in -p tcp --match multiport --dport 179 -j ACCEPT。

现在，所有节点上的一切似乎都正常运行：

IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS  |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+---------------+-------------------+-------+----------+-------------+
| 134.xx.xx.163 | node-to-node mesh | up    | 19:33:58 | Established |
| 134.xx.xx.164 | node-to-node mesh | up    | 19:33:40 | Established |
| 134.xx.xx.165 | node-to-node mesh | up    | 19:35:07 | Established |
| 134.xx.xx.168 | node-to-node mesh | up    | 19:35:01 | Established |
+---------------+-------------------+-------+----------+-------------+

在 kubeadm 集群中升级 calico-node 问题

Issue upgrading calico-node in kubeadm cluster

docker

kubernetes

project-calico

kubeadm