在 kubeadm 集群中升级 calico-node 问题
Issue upgrading calico-node in kubeadm cluster
我要去upgrade Calico node and cni as per this link for "Upgrading Components Individually"
方向很清楚(我会把每个节点都封锁起来,做calico/cni
和calico/node
的步骤),但我不太清楚
是什么意思
Update the image in your process management to reference the new version
升级 calico/node
容器。
除此之外,我看不出关于指示的其他问题。我们的环境是k8s kubeadm集群。
我想真正的问题是:我在哪里告诉 k8s 使用更新版本的 calico/node
图像?
编辑
回答以上问题:
我刚刚对 calico.yaml
和 rbac-kdd.yaml
都做了 kubectl delete -f
,然后对这些文件的最新版本做了 kubectl create -f
。
现在一切似乎都是 3.3.2 版,但我现在在所有 calico-node 上都收到此错误 pods:
Warning Unhealthy 84s (x181 over 31m) kubelet, thalia4 Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with <node IP addresses here
我运行calicoctl nodd status
得到了
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+--------------------------------+
| 134.x.x.163 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.x.x.164 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.x.x.165 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.x.x.168 | node-to-node mesh | start | 02:36:29 | Active Socket: Host is |
| | | | | unreachable |
+---------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
我假设 134.x.x.168 无法访问是我收到上述健康检查警告的原因。
虽然不确定该怎么做。这个节点在k8s集群中可用(这是节点thalia4
):
[gms@thalia0 calico]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
thalia0 Ready master 87d v1.13.1
thalia1 Ready <none> 48d v1.13.1
thalia2 Ready <none> 30d v1.13.1
thalia3 Ready <none> 87d v1.13.1
thalia4 Ready <none> 48d v1.13.1
编辑 2
calicoctl node status
在 thalia4 上给了
[sudo] password for gms:
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+---------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+---------+
| 134.xx.xx.162 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.xx.xx.163 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.xx.xx.164 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.xx.xx.165 | node-to-node mesh | start | 02:36:29 | Connect |
+---------------+-------------------+-------+----------+---------+
而kubectl describe node thalia4
给了
Name: thalia4.domain
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
dns=dns4
kubernetes.io/hostname=thalia4
node_name=thalia4
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 134.xx.xx.168/26
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 03 Dec 2018 14:17:07 -0600
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk Unknown Fri, 21 Dec 2018 11:58:38 -0600 Sat, 12 Jan 2019 16:44:10 -0600 NodeStatusUnknown Kubelet stopped posting node status.
MemoryPressure False Mon, 21 Jan 2019 20:54:38 -0600 Sat, 12 Jan 2019 16:50:18 -0600 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 21 Jan 2019 20:54:38 -0600 Sat, 12 Jan 2019 16:50:18 -0600 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 21 Jan 2019 20:54:38 -0600 Sat, 12 Jan 2019 16:50:18 -0600 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 21 Jan 2019 20:54:38 -0600 Sun, 20 Jan 2019 20:27:10 -0600 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 134.xx.xx.168
Hostname: thalia4
Capacity:
cpu: 4
ephemeral-storage: 6878Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8009268Ki
pods: 110
Allocatable:
cpu: 4
ephemeral-storage: 6490895145
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 7906868Ki
pods: 110
System Info:
Machine ID: c011569a40b740a88a672a5cc526b3ba
System UUID: 42093037-F27E-CA90-01E1-3B253813B904
Boot ID: ffa5170e-da2b-4c09-bd8a-032ce9fca2ee
Kernel Version: 3.10.0-957.1.3.el7.x86_64
OS Image: Red Hat Enterprise Linux
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.13.1
Kubelet Version: v1.13.1
Kube-Proxy Version: v1.13.1
PodCIDR: 192.168.4.0/24
Non-terminated Pods: (3 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system calico-node-8xqbs 250m (6%) 0 (0%) 0 (0%) 0 (0%) 24h
kube-system coredns-786f4c87c8-sbks2 100m (2%) 0 (0%) 70Mi (0%) 170Mi (2%) 47h
kube-system kube-proxy-zp4fk 0 (0%) 0 (0%) 0 (0%) 0 (0%) 31d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 350m (8%) 0 (0%)
memory 70Mi (0%) 170Mi (2%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
我认为这是防火墙问题,但我在 Slack 频道上被告知 "If you're not using host endpoints then we don't mess with your host's connectivity. It sounds like you've got something blocking port 179 on that host."
不确定那会在哪里? iptables 规则在所有节点上看起来都一样。
--network-plugin=cni 指定我们将 cni 网络插件与位于 --cni-bin-dir(默认 /opt/cni/bin)中的实际 CNI 插件二进制文件和位于 -- 中的 CNI 插件配置一起使用cni-conf-dir (默认 /etc/cni/net.d).
例如
--网络插件=cni
--cni-bin-dir=/opt/cni/bin #可能有多个cni bin,比如calico/weave...,可以用command '/opt/cni/bin/calico -v ' 显示印花布版本
--cni-conf-dir=/etc/cni/net.d #定义详细的cni插件配置,如下:
{
"name": "calico-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"mtu": 8950,
"policy": {
"type": "k8s"
},
"ipam": {
"type": "calico-ipam",
"assign_ipv6": "false",
"assign_ipv4": "true"
},
"etcd_endpoints": "https://172.16.1.5:2379,https://172.16.1.9:2379,https://172.16.1.15:2379",
"etcd_key_file": "/etc/etcd/ssl/etcd-client-key.pem",
"etcd_cert_file": "/etc/etcd/ssl/etcd-client.pem",
"etcd_ca_cert_file": "/etc/etcd/ssl/ca.pem",
"kubernetes": {
"kubeconfig": "/etc/kubernetes/cluster-admin.kubeconfig"
}
}
]
}
我想通了这个问题。我必须在 iptables 中为所有节点上的 cali-failsafe-in
链添加一个显式规则作为 sudo iptables -A cali-failsafe-in -p tcp --match multiport --dport 179 -j ACCEPT
。
现在,所有节点上的一切似乎都正常运行:
IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+-------------+
| 134.xx.xx.163 | node-to-node mesh | up | 19:33:58 | Established |
| 134.xx.xx.164 | node-to-node mesh | up | 19:33:40 | Established |
| 134.xx.xx.165 | node-to-node mesh | up | 19:35:07 | Established |
| 134.xx.xx.168 | node-to-node mesh | up | 19:35:01 | Established |
+---------------+-------------------+-------+----------+-------------+
我要去upgrade Calico node and cni as per this link for "Upgrading Components Individually"
方向很清楚(我会把每个节点都封锁起来,做calico/cni
和calico/node
的步骤),但我不太清楚
Update the image in your process management to reference the new version
升级 calico/node
容器。
除此之外,我看不出关于指示的其他问题。我们的环境是k8s kubeadm集群。
我想真正的问题是:我在哪里告诉 k8s 使用更新版本的 calico/node
图像?
编辑
回答以上问题:
我刚刚对 calico.yaml
和 rbac-kdd.yaml
都做了 kubectl delete -f
,然后对这些文件的最新版本做了 kubectl create -f
。
现在一切似乎都是 3.3.2 版,但我现在在所有 calico-node 上都收到此错误 pods:
Warning Unhealthy 84s (x181 over 31m) kubelet, thalia4 Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with <node IP addresses here
我运行calicoctl nodd status
得到了
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+--------------------------------+
| 134.x.x.163 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.x.x.164 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.x.x.165 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.x.x.168 | node-to-node mesh | start | 02:36:29 | Active Socket: Host is |
| | | | | unreachable |
+---------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
我假设 134.x.x.168 无法访问是我收到上述健康检查警告的原因。
虽然不确定该怎么做。这个节点在k8s集群中可用(这是节点thalia4
):
[gms@thalia0 calico]$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
thalia0 Ready master 87d v1.13.1
thalia1 Ready <none> 48d v1.13.1
thalia2 Ready <none> 30d v1.13.1
thalia3 Ready <none> 87d v1.13.1
thalia4 Ready <none> 48d v1.13.1
编辑 2
calicoctl node status
在 thalia4 上给了
[sudo] password for gms:
Calico process is running.
IPv4 BGP status
+---------------+-------------------+-------+----------+---------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+---------+
| 134.xx.xx.162 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.xx.xx.163 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.xx.xx.164 | node-to-node mesh | start | 02:36:29 | Connect |
| 134.xx.xx.165 | node-to-node mesh | start | 02:36:29 | Connect |
+---------------+-------------------+-------+----------+---------+
而kubectl describe node thalia4
给了
Name: thalia4.domain
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
dns=dns4
kubernetes.io/hostname=thalia4
node_name=thalia4
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
projectcalico.org/IPv4Address: 134.xx.xx.168/26
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 03 Dec 2018 14:17:07 -0600
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk Unknown Fri, 21 Dec 2018 11:58:38 -0600 Sat, 12 Jan 2019 16:44:10 -0600 NodeStatusUnknown Kubelet stopped posting node status.
MemoryPressure False Mon, 21 Jan 2019 20:54:38 -0600 Sat, 12 Jan 2019 16:50:18 -0600 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Mon, 21 Jan 2019 20:54:38 -0600 Sat, 12 Jan 2019 16:50:18 -0600 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Mon, 21 Jan 2019 20:54:38 -0600 Sat, 12 Jan 2019 16:50:18 -0600 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Mon, 21 Jan 2019 20:54:38 -0600 Sun, 20 Jan 2019 20:27:10 -0600 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 134.xx.xx.168
Hostname: thalia4
Capacity:
cpu: 4
ephemeral-storage: 6878Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 8009268Ki
pods: 110
Allocatable:
cpu: 4
ephemeral-storage: 6490895145
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 7906868Ki
pods: 110
System Info:
Machine ID: c011569a40b740a88a672a5cc526b3ba
System UUID: 42093037-F27E-CA90-01E1-3B253813B904
Boot ID: ffa5170e-da2b-4c09-bd8a-032ce9fca2ee
Kernel Version: 3.10.0-957.1.3.el7.x86_64
OS Image: Red Hat Enterprise Linux
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.13.1
Kubelet Version: v1.13.1
Kube-Proxy Version: v1.13.1
PodCIDR: 192.168.4.0/24
Non-terminated Pods: (3 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system calico-node-8xqbs 250m (6%) 0 (0%) 0 (0%) 0 (0%) 24h
kube-system coredns-786f4c87c8-sbks2 100m (2%) 0 (0%) 70Mi (0%) 170Mi (2%) 47h
kube-system kube-proxy-zp4fk 0 (0%) 0 (0%) 0 (0%) 0 (0%) 31d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 350m (8%) 0 (0%)
memory 70Mi (0%) 170Mi (2%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
我认为这是防火墙问题,但我在 Slack 频道上被告知 "If you're not using host endpoints then we don't mess with your host's connectivity. It sounds like you've got something blocking port 179 on that host."
不确定那会在哪里? iptables 规则在所有节点上看起来都一样。
--network-plugin=cni 指定我们将 cni 网络插件与位于 --cni-bin-dir(默认 /opt/cni/bin)中的实际 CNI 插件二进制文件和位于 -- 中的 CNI 插件配置一起使用cni-conf-dir (默认 /etc/cni/net.d).
例如
--网络插件=cni
--cni-bin-dir=/opt/cni/bin #可能有多个cni bin,比如calico/weave...,可以用command '/opt/cni/bin/calico -v ' 显示印花布版本
--cni-conf-dir=/etc/cni/net.d #定义详细的cni插件配置,如下:
{
"name": "calico-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"mtu": 8950,
"policy": {
"type": "k8s"
},
"ipam": {
"type": "calico-ipam",
"assign_ipv6": "false",
"assign_ipv4": "true"
},
"etcd_endpoints": "https://172.16.1.5:2379,https://172.16.1.9:2379,https://172.16.1.15:2379",
"etcd_key_file": "/etc/etcd/ssl/etcd-client-key.pem",
"etcd_cert_file": "/etc/etcd/ssl/etcd-client.pem",
"etcd_ca_cert_file": "/etc/etcd/ssl/ca.pem",
"kubernetes": {
"kubeconfig": "/etc/kubernetes/cluster-admin.kubeconfig"
}
}
]
}
我想通了这个问题。我必须在 iptables 中为所有节点上的 cali-failsafe-in
链添加一个显式规则作为 sudo iptables -A cali-failsafe-in -p tcp --match multiport --dport 179 -j ACCEPT
。
现在,所有节点上的一切似乎都正常运行:
IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+-------------+
| 134.xx.xx.163 | node-to-node mesh | up | 19:33:58 | Established |
| 134.xx.xx.164 | node-to-node mesh | up | 19:33:40 | Established |
| 134.xx.xx.165 | node-to-node mesh | up | 19:35:07 | Established |
| 134.xx.xx.168 | node-to-node mesh | up | 19:35:01 | Established |
+---------------+-------------------+-------+----------+-------------+