运行 hybrid/heterogeneous Kubernetes 集群,节点 运行 在不同的网络中使用 VPN

Running a hybrid/heterogeneous Kubernetes cluster with nodes running in different networks using a VPN

我的目标是建立一个 hybrid/heterogeneous Kubernetes 集群模型,其中我有以下设置:

运行 在我的笔记本电脑上本地具有三个 VM 的 Kubernetes 集群没有问题,并且可以与 Weave Net 一起正常工作。但是,在如上所述对我的 Kubernetes 集群进行建模时,存在一些通信问题(我猜)。

由于 Kubernetes 被设计为 运行 在节点上,因此所有节点都位于同一网络中,我在 AWS 上设置了一个 OpenVPN 服务器并连接了我的笔记本电脑和 Raspberry Pi 以它。当从属节点位于不同的网络中时,我希望这足以 运行 异构设置中的 Kubernetes。当然,这是一个错误的假设。

如果我 运行 从属节点上的 Kubernetes 仪表板并尝试访问它,我会超时。如果我 运行 它在主节点上,一切都按预期工作。

我使用 kubeadm init --apiserver-advertise-address= 在 AWS 上设置集群,并使用 kubeadm join 连接节点。

$ kubectl get pods --all-namespaces -o wide:

NAMESPACE     NAME                                     READY     STATUS              RESTARTS   AGE       IP              NODE
kube-system   etcd-ip-172-31-28-6                      1/1       Running             0          5m        172.31.28.6     ip-172-31-28-6
kube-system   kube-apiserver-ip-172-31-28-6            1/1       Running             0          5m        172.31.28.6     ip-172-31-28-6
kube-system   kube-controller-manager-ip-172-31-28-6   1/1       Running             0          5m        172.31.28.6     ip-172-31-28-6
kube-system   kube-dns-6f4fd4bdf-w6ctf                 0/3       ContainerCreating   0          15h       <none>          osboxes
kube-system   kube-proxy-2pl2f                         1/1       Running             0          15h       172.31.28.6     ip-172-31-28-6
kube-system   kube-proxy-7b89c                         0/1       CrashLoopBackOff    15         15h       192.168.2.106   edge-1
kube-system   kube-proxy-qg69g                         1/1       Running             1          15h       10.0.2.15       osboxes
kube-system   kube-scheduler-ip-172-31-28-6            1/1       Running             0          5m        172.31.28.6     ip-172-31-28-6
kube-system   weave-net-pqxfp                          1/2       CrashLoopBackOff    189        15h       172.31.28.6     ip-172-31-28-6
kube-system   weave-net-thhzr                          1/2       CrashLoopBackOff    12         36m       192.168.2.106   edge-1
kube-system   weave-net-v69hj                          2/2       Running             7          15h       10.0.2.15       osboxes

$ kubectl -n kube-system 日志 --v=7 kube-dns-6f4fd4bdf-w6ctf -c kubedns:

...
I0321 09:04:25.620580   23936 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/kube-dns-6f4fd4bdf-w6ctf/log?container=kubedns
I0321 09:04:25.620605   23936 round_trippers.go:421] Request Headers:
I0321 09:04:25.620611   23936 round_trippers.go:424]     Accept: application/json, */*
I0321 09:04:25.620616   23936 round_trippers.go:424]     User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:04:25.713821   23936 round_trippers.go:439] Response Status: 400 Bad Request in 93 milliseconds
I0321 09:04:25.714106   23936 helpers.go:201] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "container \"kubedns\" in pod \"kube-dns-6f4fd4bdf-w6ctf\" is waiting to start: ContainerCreating",
  "reason": "BadRequest",
  "code": 400
}]
F0321 09:04:25.714134   23936 helpers.go:119] Error from server (BadRequest): container "kubedns" in pod "kube-dns-6f4fd4bdf-w6ctf" is waiting to start: ContainerCreating

kubectl -n kube-system 日志 --v=7 kube-proxy-7b89c:

...
I0321 09:06:51.803852   24289 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/kube-proxy-7b89c/log
I0321 09:06:51.803879   24289 round_trippers.go:421] Request Headers:
I0321 09:06:51.803891   24289 round_trippers.go:424]     User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:06:51.803900   24289 round_trippers.go:424]     Accept: application/json, */*
I0321 09:08:59.110869   24289 round_trippers.go:439] Response Status: 500 Internal Server Error in 127306 milliseconds
I0321 09:08:59.111129   24289 helpers.go:201] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "Get https://192.168.2.106:10250/containerLogs/kube-system/kube-proxy-7b89c/kube-proxy: dial tcp 192.168.2.106:10250: getsockopt: connection timed out",
  "code": 500
}]
F0321 09:08:59.111156   24289 helpers.go:119] Error from server: Get https://192.168.2.106:10250/containerLogs/kube-system/kube-proxy-7b89c/kube-proxy: dial tcp 192.168.2.106:10250: getsockopt: connection timed out

kubectl -n kube-system logs --v=7 weave-net-pqxfp -c weave:

...
I0321 09:12:08.047206   24847 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/weave-net-pqxfp/log?container=weave
I0321 09:12:08.047233   24847 round_trippers.go:421] Request Headers:
I0321 09:12:08.047335   24847 round_trippers.go:424]     Accept: application/json, */*
I0321 09:12:08.047347   24847 round_trippers.go:424]     User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:12:08.062494   24847 round_trippers.go:439] Response Status: 200 OK in 15 milliseconds
DEBU: 2018/03/21 09:11:26.847013 [kube-peers] Checking peer "fa:10:a4:97:7e:7b" against list &{[{6e:fd:f4:ef:1e:f5 osboxes}]}
Peer not in list; removing persisted data
INFO: 2018/03/21 09:11:26.880946 Command line options: map[expect-npc:true ipalloc-init:consensus=3 db-prefix:/weavedb/weave-net http-addr:127.0.0.1:6784 ipalloc-range:10.32.0.0/12 nickname:ip-172-31-28-6 host-root:/host name:fa:10:a4:97:7e:7b no-dns:true status-addr:0.0.0.0:6782 datapath:datapath docker-api: port:6783 conn-limit:30]
INFO: 2018/03/21 09:11:26.880995 weave  2.2.1
FATA: 2018/03/21 09:11:26.881117 Inconsistent bridge state detected. Please do 'weave reset' and try again

kubectl -n kube-system logs --v=7 weave-net-thhzr -c weave:

...
I0321 09:15:13.787905   25113 round_trippers.go:414] GET https://<PUBLIC_IP>:6443/api/v1/namespaces/kube-system/pods/weave-net-thhzr/log?container=weave
I0321 09:15:13.787932   25113 round_trippers.go:421] Request Headers:
I0321 09:15:13.787938   25113 round_trippers.go:424]     Accept: application/json, */*
I0321 09:15:13.787946   25113 round_trippers.go:424]     User-Agent: kubectl/v1.9.4 (linux/amd64) kubernetes/bee2d15
I0321 09:17:21.126863   25113 round_trippers.go:439] Response Status: 500 Internal Server Error in 127338 milliseconds
I0321 09:17:21.127140   25113 helpers.go:201] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "Get https://192.168.2.106:10250/containerLogs/kube-system/weave-net-thhzr/weave: dial tcp 192.168.2.106:10250: getsockopt: connection timed out",
  "code": 500
}]
F0321 09:17:21.127167   25113 helpers.go:119] Error from server: Get https://192.168.2.106:10250/containerLogs/kube-system/weave-net-thhzr/weave: dial tcp 192.168.2.106:10250: getsockopt: connection timed out

$ ifconfig(AWS 上的 Kubernetes 大师):

datapath  Link encap:Ethernet  HWaddr ae:90:9a:b2:7e:d9
          inet6 addr: fe80::ac90:9aff:feb2:7ed9/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1376  Metric:1
          RX packets:29 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:1904 (1.9 KB)  TX bytes:1188 (1.1 KB)

docker0   Link encap:Ethernet  HWaddr 02:42:50:39:1f:c7
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth0      Link encap:Ethernet  HWaddr 06:a3:d0:8e:19:72
          inet addr:172.31.28.6  Bcast:172.31.31.255  Mask:255.255.240.0
          inet6 addr: fe80::4a3:d0ff:fe8e:1972/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:10323322 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9418208 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3652314289 (3.6 GB)  TX bytes:3117288442 (3.1 GB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:11388236 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11388236 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:2687297929 (2.6 GB)  TX bytes:2687297929 (2.6 GB)

tun0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          inet addr:10.8.0.1  P-t-P:10.8.0.2  Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:97222 errors:0 dropped:0 overruns:0 frame:0
          TX packets:164607 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:13381022 (13.3 MB)  TX bytes:209129403 (209.1 MB)

vethwe-bridge Link encap:Ethernet  HWaddr 12:59:54:73:0f:91
          inet6 addr: fe80::1059:54ff:fe73:f91/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1376  Metric:1
          RX packets:18 errors:0 dropped:0 overruns:0 frame:0
          TX packets:36 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1476 (1.4 KB)  TX bytes:2940 (2.9 KB)

vethwe-datapath Link encap:Ethernet  HWaddr 8e:75:1c:92:93:0d
          inet6 addr: fe80::8c75:1cff:fe92:930d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1376  Metric:1
          RX packets:36 errors:0 dropped:0 overruns:0 frame:0
          TX packets:18 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2940 (2.9 KB)  TX bytes:1476 (1.4 KB)

vxlan-6784 Link encap:Ethernet  HWaddr a6:02:da:5e:d5:2a
          inet6 addr: fe80::a402:daff:fe5e:d52a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:65485  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:8 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

$ sudo systemctl status kubelet.service(在 AWS 上):

Mar 21 09:34:59 ip-172-31-28-6 kubelet[19676]: W0321 09:34:59.202058   19676 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Mar 21 09:34:59 ip-172-31-28-6 kubelet[19676]: E0321 09:34:59.202452   19676 kubelet.go:2109] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: I0321 09:35:01.535541   19676 kuberuntime_manager.go:514] Container {Name:weave Image:weaveworks/weave-kube:2.2.1 Command:[/home/weave/launch.sh] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:HOSTNAME Value: ValueFrom:&EnvVarSource{FieldRef:&ObjectFieldSelector{APIVersion:v1,FieldPath:spec.nodeName,},ResourceFieldRef:nil,ConfigMapKeyRef:nil,SecretKeyRef:nil,}}] Resources:{Limits:map[] Requests:map[cpu:{i:{value:10 scale:-3} d:{Dec:<nil>} s:10m Format:DecimalSI}]} VolumeMounts:[{Name:weavedb ReadOnly:false MountPath:/weavedb SubPath: MountPropagation:<nil>} {Name:cni-bin ReadOnly:false MountPath:/host/opt SubPath: MountPropagation:<nil>} {Name:cni-bin2 ReadOnly:false MountPath:/host/home SubPath: MountPropagation:<nil>} {Name:cni-conf ReadOnly:false MountPath:/host/etc SubPath: MountPropagation:<nil>} {Name:dbus ReadOnly:false MountPath:/host/var/lib/dbus SubPath: MountPropagation:<nil>} {Name:lib-modules ReadOnly:false MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:weave-net-token-vn8rh ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:&Probe{Handler:Handler{Exec:nil,HTTPGet:&HTTPGetAction{Path:/status,Port:6784,Host:127.0.0.1,Scheme:HTTP,HTTPHeaders:[],},TCPSocket:nil,},InitialDelaySeconds:30,TimeoutSeconds:1,PeriodSeconds:10,SuccessThreshold:1,FailureThreshold:3,} ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: I0321 09:35:01.536504   19676 kuberuntime_manager.go:758] checking backoff for container "weave" in pod "weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)"
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: I0321 09:35:01.536636   19676 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=weave pod=weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)
Mar 21 09:35:01 ip-172-31-28-6 kubelet[19676]: E0321 09:35:01.536664   19676 pod_workers.go:186] Error syncing pod c6450070-2c61-11e8-a50d-06a3d08e1972 ("weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)"), skipping: failed to "StartContainer" for "weave" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=weave pod=weave-net-pqxfp_kube-system(c6450070-2c61-11e8-a50d-06a3d08e1972)"

$ sudo systemctl status kubelet.service(在笔记本电脑上)

Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.662670     715 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.663412     715 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.663869     715 kuberuntime_manager.go:647] createPodSandbox for pod "kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Mar 21 05:47:18 osboxes kubelet[715]: E0321 05:47:18.664295     715 pod_workers.go:186] Error syncing pod 11886465-2c61-11e8-a50d-06a3d08e1972 ("kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)"), skipping: failed to "CreatePodSandbox" for "kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-6f4fd4bdf-w6ctf_kube-system(11886465-2c61-11e8-a50d-06a3d08e1972)\" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Mar 21 05:47:20 osboxes kubelet[715]: W0321 05:47:20.536161     715 pod_container_deletor.go:77] Container "bbf490835face43b70c24dbcb67c3f75872e7831b5e2605dc8bb71210910e273" not found in pod's containers

$ sudo systemctl status kubelet.service(在 Raspberry Pi 上):

Mar 21 09:29:01 edge-1 kubelet[339]: I0321 09:29:01.188199     339 kuberuntime_manager.go:514] Container {Name:kube-proxy Image:gcr.io/google_containers/kube-proxy-amd64:v1.9.5 Command:[/usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:kube-proxy ReadOnly:false MountPath:/var/lib/kube-proxy SubPath: MountPropagation:<nil>} {Name:xtables-lock ReadOnly:false MountPath:/run/xtables.lock SubPath: MountPropagation:<nil>} {Name:lib-modules ReadOnly:true MountPath:/lib/modules SubPath: MountPropagation:<nil>} {Name:kube-proxy-token-px7dt ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,} Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Mar 21 09:29:01 edge-1 kubelet[339]: I0321 09:29:01.189023     339 kuberuntime_manager.go:758] checking backoff for container "kube-proxy" in pod "kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)"
Mar 21 09:29:01 edge-1 kubelet[339]: I0321 09:29:01.190174     339 kuberuntime_manager.go:768] Back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)
Mar 21 09:29:01 edge-1 kubelet[339]: E0321 09:29:01.190518     339 pod_workers.go:186] Error syncing pod 5bebafa1-2c61-11e8-a50d-06a3d08e1972 ("kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)"), skipping: failed to "StartContainer" for "kube-proxy" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-proxy pod=kube-proxy-7b89c_kube-system(5bebafa1-2c61-11e8-a50d-06a3d08e1972)"
Mar 21 09:29:02 edge-1 kubelet[339]: W0321 09:29:02.278342     339 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Mar 21 09:29:02 edge-1 kubelet[339]: E0321 09:29:02.282534     339 kubelet.go:2120] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

你的 Kubernetes 主节点和节点之间的网络肯定有问题。

但是,首先,创建这种混合安装并不是一个好主意。 master(s) 和节点之间必须有稳定的网络,否则会导致很多问题。但这很难通过互联网连接实现。

如果您想准备混合安装,您可以在 AWS 中的 Kubernetes 集群和本地硬件之间使用 Federation

但是,考虑到您的问题,我发现您在 Master 和 edge-1 节点上的 Weave 网络存在问题。从日志中不清楚您遇到的是哪种问题,请尝试使用 WEAVE_DEBUG=1 环境变量 运行 编织容器。如果没有联网,其他 pods 如 kube-dns 将无法正常工作。

另外,您是如何设置 OpenVPN 的。您必须在 AWS 子网和 client-to-client 之间进行路由。因此,您用于在所有节点上设置集群的所有地址都必须在彼此之间路由。再检查一次您将 Kubernetes 组件和 Weave 绑定到哪个地址,该地址是否可路由。

  1. 此消息解释了其中一次崩溃:

FATA: 2018/03/21 09:11:26.881117 Inconsistent bridge state detected. Please do 'weave reset' and try again

由于在 Kubernetes 节点上 运行 weave 命令有点复杂,只需重新启动节点并从头开始重新创建网桥。

  1. 此消息表示无法联系节点以获取日志:

F0321 09:08:59.111156 24289 helpers.go:119] Error from server: Get https://192.168.2.106:10250/containerLogs/kube-system/kube-proxy-7b89c/kube-proxy: dial tcp 192.168.2.106:10250: getsockopt: connection timed out

考虑这些主机是否可以通过其常规网络相互访问。