kubeadm 工作节点上 pod 运行 的名称解析暂时失败
Temporary failure in name resolution for pod running on kubeadm worker node
I 运行 Kafka 在 VMWare 上的 Kubernetes 集群中,带有一个 ControlPlane 和一个工作节点。从 ControlPlane 节点我的客户端可以与 Kafka 通信,但是从我的工作节点这最终会出现这个错误
%3|1638529687.405|FAIL|apollo-prototype-765f4d8bcf-bjpf4#producer-2| [thrd:sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap]: sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap: Failed to resolve 'my-cluster-kafka-bootstrap:9092': Temporary failure in name resolution (after 20016ms in state CONNECT, 2 identical error(s) suppressed)
%3|1638529687.406|ERROR|apollo-prototype-765f4d8bcf-bjpf4#producer-2| [thrd:app]: apollo-prototype-765f4d8bcf-bjpf4#producer-2: sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap: Failed to resolve 'my-cluster-kafka-bootstrap:9092': Temporary failure in name resolution (after 20016ms in state CONNECT, 2 identical error(s) suppressed)
这是我的 Kafka 集群清单(使用 Strimzi)
listeners:
- name: plain
port: 9092
type: internal
tls: false
authentication:
type: scram-sha-512
- name: external
port: 9094
type: ingress
tls: true
authentication:
type: scram-sha-512
configuration:
class: nginx
bootstrap:
host: localb.kafka.xxx.com
brokers:
- broker: 0
host: local.kafka.xxx.com
值得一提的是,完全相同的配置,当我 运行 在云中工作时完美无缺。
Telnet 和 nslookup(来自两个节点)抛出错误。
CoreDNS 日志甚至没有提到这个错误。
两个节点上的防火墙也被禁用。
你能帮帮我吗?谢谢!
更新:解决方案
Calico Pod(来自工作节点)抱怨 bird: Netlink: Network is down,即使它没有崩溃
2021-12-03 09:39:58.051 [INFO][90] felix/int_dataplane.go 1539: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.051 [INFO][90] felix/hostip_mgr.go 85: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.052 [INFO][90] felix/ipsets.go 130: Queueing IP set for creation family="inet" setID="this-host" setType="hash:ip"
2021-12-03 09:39:58.057 [INFO][90] felix/ipsets.go 785: Doing full IP set rewrite family="inet" numMembersInPendingReplace=3 setID="this-host"
2021-12-03 09:39:58.059 [INFO][90] felix/int_dataplane.go 1036: Linux interface state changed. ifIndex=13 ifaceName="tunl0" state="down"
2021-12-03 09:39:58.082 [INFO][90] felix/int_dataplane.go 1521: Received interface update msg=&intdataplane.ifaceUpdate{Name:"tunl0", State:"down", Index:13}
bird: Netlink: Network is down
Here 是我所做的,它非常有效!
The fault is caused by the different ipvs modules loaded by the node.
I configured the ipip module for the new node, but the old node did
not load the ipip module, which caused the calico exception. Delete
the ipip module to return to normal.
[root@k8s-node236-232 ~]# lsmod | grep ipip
ipip 16384 0
tunnel4 16384 1 ipip
ip_tunnel 24576 1 ipip
[root@k8s-node236-232 ~]# modprobe -r ipip
[root@k8s-node236-232 ~]# lsmod | grep ipip
Calico Pod(来自工作节点)抱怨 bird: Netlink: Network is down,即使它没有崩溃
2021-12-03 09:39:58.051 [INFO][90] felix/int_dataplane.go 1539: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.051 [INFO][90] felix/hostip_mgr.go 85: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.052 [INFO][90] felix/ipsets.go 130: Queueing IP set for creation family="inet" setID="this-host" setType="hash:ip"
2021-12-03 09:39:58.057 [INFO][90] felix/ipsets.go 785: Doing full IP set rewrite family="inet" numMembersInPendingReplace=3 setID="this-host"
2021-12-03 09:39:58.059 [INFO][90] felix/int_dataplane.go 1036: Linux interface state changed. ifIndex=13 ifaceName="tunl0" state="down"
2021-12-03 09:39:58.082 [INFO][90] felix/int_dataplane.go 1521: Received interface update msg=&intdataplane.ifaceUpdate{Name:"tunl0", State:"down", Index:13}
bird: Netlink: Network is down
Here 是我所做的,效果非常好!
The fault is caused by the different ipvs modules loaded by the node.
I configured the ipip module for the new node, but the old node did
not load the ipip module, which caused the calico exception. Delete
the ipip module to return to normal.
[root@k8s-node236-232 ~]# lsmod | grep ipip
ipip 16384 0
tunnel4 16384 1 ipip
ip_tunnel 24576 1 ipip
[root@k8s-node236-232 ~]# modprobe -r ipip
[root@k8s-node236-232 ~]# lsmod | grep ipip
I 运行 Kafka 在 VMWare 上的 Kubernetes 集群中,带有一个 ControlPlane 和一个工作节点。从 ControlPlane 节点我的客户端可以与 Kafka 通信,但是从我的工作节点这最终会出现这个错误
%3|1638529687.405|FAIL|apollo-prototype-765f4d8bcf-bjpf4#producer-2| [thrd:sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap]: sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap: Failed to resolve 'my-cluster-kafka-bootstrap:9092': Temporary failure in name resolution (after 20016ms in state CONNECT, 2 identical error(s) suppressed)
%3|1638529687.406|ERROR|apollo-prototype-765f4d8bcf-bjpf4#producer-2| [thrd:app]: apollo-prototype-765f4d8bcf-bjpf4#producer-2: sasl_plaintext://my-cluster-kafka-bootstrap:9092/bootstrap: Failed to resolve 'my-cluster-kafka-bootstrap:9092': Temporary failure in name resolution (after 20016ms in state CONNECT, 2 identical error(s) suppressed)
这是我的 Kafka 集群清单(使用 Strimzi)
listeners:
- name: plain
port: 9092
type: internal
tls: false
authentication:
type: scram-sha-512
- name: external
port: 9094
type: ingress
tls: true
authentication:
type: scram-sha-512
configuration:
class: nginx
bootstrap:
host: localb.kafka.xxx.com
brokers:
- broker: 0
host: local.kafka.xxx.com
值得一提的是,完全相同的配置,当我 运行 在云中工作时完美无缺。
Telnet 和 nslookup(来自两个节点)抛出错误。 CoreDNS 日志甚至没有提到这个错误。 两个节点上的防火墙也被禁用。
你能帮帮我吗?谢谢!
更新:解决方案 Calico Pod(来自工作节点)抱怨 bird: Netlink: Network is down,即使它没有崩溃
2021-12-03 09:39:58.051 [INFO][90] felix/int_dataplane.go 1539: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.051 [INFO][90] felix/hostip_mgr.go 85: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.052 [INFO][90] felix/ipsets.go 130: Queueing IP set for creation family="inet" setID="this-host" setType="hash:ip"
2021-12-03 09:39:58.057 [INFO][90] felix/ipsets.go 785: Doing full IP set rewrite family="inet" numMembersInPendingReplace=3 setID="this-host"
2021-12-03 09:39:58.059 [INFO][90] felix/int_dataplane.go 1036: Linux interface state changed. ifIndex=13 ifaceName="tunl0" state="down"
2021-12-03 09:39:58.082 [INFO][90] felix/int_dataplane.go 1521: Received interface update msg=&intdataplane.ifaceUpdate{Name:"tunl0", State:"down", Index:13}
bird: Netlink: Network is down
Here 是我所做的,它非常有效!
The fault is caused by the different ipvs modules loaded by the node. I configured the ipip module for the new node, but the old node did not load the ipip module, which caused the calico exception. Delete the ipip module to return to normal.
[root@k8s-node236-232 ~]# lsmod | grep ipip ipip 16384 0 tunnel4 16384 1 ipip ip_tunnel 24576 1 ipip [root@k8s-node236-232 ~]# modprobe -r ipip [root@k8s-node236-232 ~]# lsmod | grep ipip
Calico Pod(来自工作节点)抱怨 bird: Netlink: Network is down,即使它没有崩溃
2021-12-03 09:39:58.051 [INFO][90] felix/int_dataplane.go 1539: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.051 [INFO][90] felix/hostip_mgr.go 85: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"tunl0", Addrs:set.mapSet{}}
2021-12-03 09:39:58.052 [INFO][90] felix/ipsets.go 130: Queueing IP set for creation family="inet" setID="this-host" setType="hash:ip"
2021-12-03 09:39:58.057 [INFO][90] felix/ipsets.go 785: Doing full IP set rewrite family="inet" numMembersInPendingReplace=3 setID="this-host"
2021-12-03 09:39:58.059 [INFO][90] felix/int_dataplane.go 1036: Linux interface state changed. ifIndex=13 ifaceName="tunl0" state="down"
2021-12-03 09:39:58.082 [INFO][90] felix/int_dataplane.go 1521: Received interface update msg=&intdataplane.ifaceUpdate{Name:"tunl0", State:"down", Index:13}
bird: Netlink: Network is down
Here 是我所做的,效果非常好!
The fault is caused by the different ipvs modules loaded by the node. I configured the ipip module for the new node, but the old node did not load the ipip module, which caused the calico exception. Delete the ipip module to return to normal.
[root@k8s-node236-232 ~]# lsmod | grep ipip ipip 16384 0 tunnel4 16384 1 ipip ip_tunnel 24576 1 ipip [root@k8s-node236-232 ~]# modprobe -r ipip [root@k8s-node236-232 ~]# lsmod | grep ipip