Kube-Proxy-Windows CrashLoopBackOff

Kube-Proxy-Windows CrashLoopBackOff

安装过程

我是 Kubernetes 的新手,目前正在 Azure VM 中设置 Kubernetes 集群。我想部署 Windows 个容器,但为了实现这一点,我需要添加 Windows 个工作节点。我已经部署了一个包含 3 个主节点和一个 Linux 工作节点的 Kubeadm 集群,这些节点运行良好。

一旦我添加了 Windows 节点,所有事情都会向下进行。首先,我使用 Flannel 作为我的 CNI 插件,并根据 Kubernetes 文档准备 deamonset 和控制平面:https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/adding-windows-nodes/

然后在安装了 Flannel deamonset 之后,我相应地安装了代理和 Docker EE。

使用过的软件

主节点

OS:Ubuntu 18.04 LTS
容器运行时:Docker 20.10.5
Kubernetes 版本:1.21.0
法兰绒图像版本:0.14.0
Kube 代理版本:1.21.0

Windows 工作节点

OS:Windows Server 2019 数据中心核心
容器运行时:Docker 20.10.4
Kubernetes 版本:1.21.0
法兰绒图像版本:0.13.0-nanoserver
Kube-proxy 版本:1.21.0-nanoserver

想要的结果:

我想看到一个完整的集群准备好使用,并且在 Running 状态下包含所有需要的东西。

当前结果:

安装完成后查看是否安装成功:

azureuser@Kube-M-001:~$ kubectl get pods -o wide -n kube-system
NAME                                  READY   STATUS             RESTARTS   AGE    IP           NODE         NOMINATED NODE   READINESS GATES
coredns-558bd4d5db-8mshg              1/1     Running            0          178m   10.244.0.3   kube-m-001   <none>           <none>
coredns-558bd4d5db-xhsmn              1/1     Running            0          178m   10.244.0.2   kube-m-001   <none>           <none>
etcd-kube-m-001                       1/1     Running            0          178m   10.0.10.4    kube-m-001   <none>           <none>
etcd-kube-m-002                       1/1     Running            0          164m   10.0.10.5    kube-m-002   <none>           <none>
etcd-kube-m-003                       1/1     Running            0          162m   10.0.10.6    kube-m-003   <none>           <none>
kube-apiserver-kube-m-001             1/1     Running            0          178m   10.0.10.4    kube-m-001   <none>           <none>
kube-apiserver-kube-m-002             1/1     Running            1          165m   10.0.10.5    kube-m-002   <none>           <none>
kube-apiserver-kube-m-003             1/1     Running            0          162m   10.0.10.6    kube-m-003   <none>           <none>
kube-controller-manager-kube-m-001    1/1     Running            1          178m   10.0.10.4    kube-m-001   <none>           <none>
kube-controller-manager-kube-m-002    1/1     Running            0          165m   10.0.10.5    kube-m-002   <none>           <none>
kube-controller-manager-kube-m-003    1/1     Running            0          163m   10.0.10.6    kube-m-003   <none>           <none>
kube-flannel-ds-5lwzf                 1/1     Running            0          165m   10.0.10.5    kube-m-002   <none>           <none>
kube-flannel-ds-6lvgp                 1/1     Running            0          129m   10.0.10.7    kube-w-001   <none>           <none>
kube-flannel-ds-dlmkt                 1/1     Running            0          163m   10.0.10.6    kube-m-003   <none>           <none>
kube-flannel-ds-h27r7                 1/1     Running            0          169m   10.0.10.4    kube-m-001   <none>           <none>
kube-flannel-ds-windows-amd64-hwbjc   1/1     Running            0          121m   10.0.64.4    kube-w-002   <none>           <none>
kube-proxy-4rkgk                      1/1     Running            0          178m   10.0.10.4    kube-m-001   <none>           <none>
kube-proxy-6g4sb                      1/1     Running            0          129m   10.0.10.7    kube-w-001   <none>           <none>
kube-proxy-tvm9g                      1/1     Running            0          165m   10.0.10.5    kube-m-002   <none>           <none>
kube-proxy-windows-j7c27              0/1     CrashLoopBackOff   26         121m   10.244.4.2   kube-w-002   <none>           <none>
kube-proxy-wzjm7                      1/1     Running            0          163m   10.0.10.6    kube-m-003   <none>           <none>
kube-scheduler-kube-m-001             1/1     Running            1          178m   10.0.10.4    kube-m-001   <none>           <none>
kube-scheduler-kube-m-002             1/1     Running            0          165m   10.0.10.5    kube-m-002   <none>           <none>
kube-scheduler-kube-m-003             1/1     Running            0          162m   10.0.10.6    kube-m-003   <none>           <none>

我检查了特定 kube-proxy pod 的日志并得到以下结果:

azureuser@Kube-M-001:~$ kubectl logs -n kube-system kube-proxy-windows-j7c27 -p

    Directory: C:\host\var\lib\kube-proxy\var\run\secrets\kubernetes.io

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            5/3/2021 12:08 PM                serviceaccount

    Directory: C:\host\k

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            5/3/2021 12:24 PM                kube-proxy
Using CNI conf file: 10-flannel.conf
I0503 12:30:23.146002    2448 flags.go:59] FLAG: --add-dir-header="false"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --alsologtostderr="false"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --bind-address="0.0.0.0"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --bind-address-hard-fail="false"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --cleanup="false"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --cluster-cidr=""
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --config="/var/lib/kube-proxy/config.conf"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --config-sync-period="15m0s"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --conntrack-max-per-core="32768"
I0503 12:30:23.194891    2448 flags.go:59] FLAG: --conntrack-min="131072"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --conntrack-tcp-timeout-close-wait="1h0m0s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --conntrack-tcp-timeout-established="24h0m0s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --detect-local-mode=""
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --enable-dsr="false"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --feature-gates="WinOverlay=true"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --healthz-bind-address="0.0.0.0:10256"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --healthz-port="10256"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --help="false"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --hostname-override="kube-w-002"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --iptables-masquerade-bit="14"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --iptables-min-sync-period="1s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --iptables-sync-period="30s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-exclude-cidrs="[]"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-min-sync-period="0s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-scheduler=""
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-strict-arp="false"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-sync-period="30s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-tcp-timeout="0s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-tcpfin-timeout="0s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --ipvs-udp-timeout="0s"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --kube-api-burst="10"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --kube-api-qps="5"
I0503 12:30:23.195318    2448 flags.go:59] FLAG: --kubeconfig=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --log-backtrace-at=":0"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --log-dir=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --log-file=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --log-file-max-size="1800"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --log-flush-frequency="5s"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --logtostderr="true"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --masquerade-all="false"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --master=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --metrics-bind-address="127.0.0.1:10249"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --metrics-port="10249"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --network-name=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --nodeport-addresses="[]"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --one-output="false"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --oom-score-adj="-999"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --profiling="false"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --proxy-mode=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --proxy-port-range=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --show-hidden-metrics-for-version=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --skip-headers="false"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --skip-log-headers="false"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --source-vip=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --stderrthreshold="2"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --udp-timeout="250ms"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --v="6"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --version="false"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --vmodule=""
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --windows-service="false"
I0503 12:30:23.195454    2448 flags.go:59] FLAG: --write-config-to=""
I0503 12:30:23.197789    2448 feature_gate.go:243] feature gates: &{map[WinOverlay:true]}
I0503 12:30:23.197789    2448 feature_gate.go:243] feature gates: &{map[WinOverlay:true]}
I0503 12:30:23.200622    2448 loader.go:372] Config loaded from file:  /var/lib/kube-proxy/kubeconfig.conf
I0503 12:30:23.221725    2448 server_windows.go:107] Using Kernelspace Proxier.
I0503 12:30:23.221725    2448 server_windows.go:110] creating dualStackProxier for Windows kernel.
time="2021-05-03T12:30:23Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 12"
time="2021-05-03T12:30:23Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 10"
time="2021-05-03T12:30:23Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 12"
time="2021-05-03T12:30:23Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 13"
time="2021-05-03T12:30:23Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 10"
I0503 12:30:23.224600    2448 proxier.go:562] "Cleaning up old HNS policy lists"
I0503 12:30:33.229568    2448 proxier.go:583] "Hns Network loaded" hnsNetworkInfo=&{name:flannel.4096 id:ae948621-bb34-486d-b31d-cf397757b7c1 networkType:Overlay remoteSubnets:[0xc0000b77c0 0xc0000b7840 0xc0000b78c0 0xc0000b7940]}
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 12"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 10"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 12"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 13"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 10"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 12"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 10"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 12"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 13"
time="2021-05-03T12:30:33Z" level=info msg="currentVersion.Major < versionRange.MinVersion.Major: 9, 10"
F0503 12:30:33.256757    2448 server.go:489] unable to create proxier: unable to create ipv4 proxier: Could not find host mac address for 0.0.0.0, hostname: kube-w-002, clusterCIDR : 10.244.0.0/16, nodeIP:0.0.0.0

但我认为 Flannel 安装中已经出了问题,因为 Flannel pod 的日志给出了以下结果:

PS C:\Users\azureuser> docker ps
CONTAINER ID   IMAGE                                          COMMAND                  CREATED       STATUS       PORTS     NAMES
0cfa1c0c7b6d   mcr.microsoft.com/oss/kubernetes/pause:1.4.1   "cmd /S /C pauseloop…"   2 hours ago   Up 2 hours             k8s_POD_kube-proxy-windows-j7c27_kube-system_df8fda84-cf94-4ca7-863a-9c9694f2b3ba_8
fb3ccc5e0cf7   sigwindowstools/flannel                        "pwsh -file /etc/kub…"   2 hours ago   Up 2 hours             k8s_kube-flannel_kube-flannel-ds-windows-amd64-hwbjc_kube-system_9f0aa635-200b-4902-93cc-1d1da7f49a5d_0
bc8e97427613   mcr.microsoft.com/oss/kubernetes/pause:1.4.1   "cmd /S /C pauseloop…"   2 hours ago   Up 2 hours             k8s_POD_kube-flannel-ds-windows-amd64-hwbjc_kube-system_9f0aa635-200b-4902-93cc-1d1da7f49a5d_0
PS C:\Users\azureuser> docker logs fb3ccc5e0cf7

    Directory: C:\host\etc\cni

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            5/3/2021 10:28 AM                net.d

    Directory: C:\host\etc

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            5/3/2021 10:28 AM                kube-flannel

    Directory: C:\host\opt\cni

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            5/3/2021 10:28 AM                bin

    Directory: C:\host\k

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            5/3/2021 10:28 AM                flannel

    Directory: C:\host\k\flannel\var\run\secrets\kubernetes.io

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            5/3/2021 10:28 AM                serviceaccount
Configuring CNI for docker
WARNING: The names of some imported commands from the module 'hns' include unapproved verbs that might make them less
discoverable. To find the commands with unapproved verbs, run the Import-Module command again with the Verbose
parameter. For a list of approved verbs, type Get-Verb.
Invoke-HnsRequest : @{Error=An adapter was not found. ; ErrorCode=2151350278; Success=False}
At C:\k\flannel\hns.psm1:233 char:16
+ ...      return Invoke-HnsRequest -Method POST -Type networks -Data $Json ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Write-Error], WriteErrorException
    + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Invoke-HNSRequest

FATA[2021-05-03T10:28:44Z] rpc error: code = Internal desc = could not create IP forward entry: The object already exists.
I0503 10:28:45.340006    5512 main.go:518] Determining IP address of default interface
I0503 10:28:47.695146    5512 main.go:531] Using interface with name Ethernet 2 and address 10.0.64.4
I0503 10:28:47.695146    5512 main.go:548] Defaulting external address to interface address (10.0.64.4)
I0503 10:28:47.767526    5512 kube.go:119] Waiting 10m0s for node controller to sync
I0503 10:28:47.769102    5512 kube.go:306] Starting kube subnet manager
I0503 10:28:48.769283    5512 kube.go:126] Node controller sync successful
I0503 10:28:48.769283    5512 main.go:246] Created subnet manager: Kubernetes Subnet Manager - kube-w-002
I0503 10:28:48.769283    5512 main.go:249] Installing signal handlers
I0503 10:28:48.769283    5512 main.go:390] Found network config - Backend type: vxlan
I0503 10:28:48.769283    5512 vxlan_windows.go:127] VXLAN config: Name=flannel.4096 MacPrefix=0E-2A VNI=4096 Port=4789 GBP=false DirectRouting=false
I0503 10:28:48.838521    5512 device_windows.go:115] Attempting to create HostComputeNetwork &{ flannel.4096 Overlay [] {[]} { [] [] []} [{Static [{10.244.4.0/24 [[123 34 84 121 112 101 34 58 34 86 83 73 68 34 44 34 83 101 116 116 105 110 103 115 34 58 123 34 73 115 111 108 97 116 105 111 110 73 100 34 58 52 48 57 54 125 125]] [{10.244.4.1 0.0.0.0/0 0}]}]}] 8 {2 0}}
E0503 10:28:49.279614    5512 streamwatcher.go:109] Unable to decode an event from the watch stream: read tcp 10.0.64.4:50315-><PUBLIC-IP>:6443: wsarecv: An established connection was aborted by the software in your host machine.
E0503 10:28:49.323566    5512 reflector.go:304] github.com/coreos/flannel/subnet/kube/kube.go:307: Failed to watch *v1.Node: Get "https://kube-lb.eastus.cloudapp.azure.com:6443/api/v1/nodes?resourceVersion=6092&timeoutSeconds=582&watch=true": dial tcp: lookup kube-lb.eastus.cloudapp.azure.com: no such host
I0503 10:28:53.739453    5512 device_windows.go:123] Waiting to get ManagementIP from HostComputeNetwork flannel.4096
I0503 10:28:54.248878    5512 device_windows.go:134] Waiting to get net interface for HostComputeNetwork flannel.4096 (10.0.64.4)
I0503 10:28:54.758966    5512 device_windows.go:148] Created HostComputeNetwork flannel.4096
I0503 10:28:54.804770    5512 main.go:313] Changing default FORWARD chain policy to ACCEPT
I0503 10:28:54.816024    5512 main.go:321] Wrote subnet file to /run/flannel/subnet.env
I0503 10:28:54.816024    5512 main.go:325] Running backend.
I0503 10:28:54.816024    5512 main.go:343] Waiting for all goroutines to exit
I0503 10:28:54.816024    5512 vxlan_network_windows.go:63] Watching for new subnet leases

谁能帮帮我?所以我可以在 Kubernetes 集群中使用我的 Windows 工作节点。

编辑 1:

解决了Flannel FATA-error,这个问题是由于Flannel 无法识别网络适配器引起的。所以在开始 Flannel 之前,我手动创建了所需的网络:

#First download HNS
PS C:\Users\azureuser> curl.exe -LO https://raw.githubusercontent.com/microsoft/SDN/master/Kubernetes/windows/hns.psm1
ipmo ./hns.psm1

#Create the network
PS C:\Users\azureuser> New-HNSNetwork -Type Overlay -AddressPrefix "192.168.255.0/30" -Gateway "192.168.255.1" -Name "External" -AdapterName "Ethernet 2" -SubnetPolicies @(@{Type = "VSID"; VSID = 9999; });

之后你可以将 windows-node 加入集群,Flannel 将正常启动,但 Kube-proxy 问题仍然存在。

您是否仍然遇到此错误?我设法通过将 windows kube-proxy 降级到至少 1.20.0 来解决这个问题。 1.21.0 必须缺少一些配置或错误。

curl -L https://github.com/kubernetes-sigs/sig-windows-tools/releases/latest/download/kube-proxy.yml | sed 's/VERSION/v1.20.0/g' | kubectl apply -f -