Kubernetes 升级未能保留 Pod 子网

Kubernetes upgrade failed to keep Pod Subnet

我已经使用 kubeadm 将我的家庭裸机集群从 1.10.2 升级到 1.11.0。由于 cgroup 不匹配,升级过程失败(我需要 systemd cgroup 而 cgroupfs 在那里 :( ).

我的集群报告它在 1.11.0。 Pods 是 运行 并且在同一个节点上通信;但是,pods 无法与其他节点上的 pods 通信。

我有 flannel 作为我的 cni,但是 pod CIDR 不知何故被更改为 172.17.0.0/16 而它应该是 10.244.0.0/16。只要我使用一台主机,网络、入口和服务就可以正常工作,但是当我需要第二台主机时,IP 路由不会发生。我的理论是 IP 路由问题正在发生。

问:当我的 CNI 插件想要使用 docker 接口时,如何修改我的 CNI 插件以在我的单个 NIC 节点上使用 flannel 接口?

kubeadm 提到我使用的是最新版本

[preflight] Running pre-flight checks.
[upgrade] Making sure the cluster is healthy:
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
I0704 23:08:54.560787    4588 feature_gate.go:230] feature gates: &{map[]}
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.11.0
[upgrade/versions] kubeadm version: v1.11.0
[upgrade/versions] Latest stable version: v1.11.0
[upgrade/versions] Latest version in the v1.11 series: v1.11.0

Awesome, you're up-to-date! Enjoy!

主网卡

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s20u3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 68:1d:ef:06:2c:3a brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.103/24 brd 192.168.0.255 scope global noprefixroute enp0s20u3
       valid_lft forever preferred_lft forever
    inet6 fe80::6a1d:efff:fe06:2c3a/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:3a:a6:ec:db brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 76:35:47:c7:1f:38 brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::7435:47ff:fec7:1f38/64 scope link 
       valid_lft forever preferred_lft forever
5: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether 0a:58:0a:f4:00:01 brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::8820:b3ff:fed0:bff9/64 scope link 
       valid_lft forever preferred_lft forever
11: veth6494e2fc@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP group default 
    link/ether fa:84:37:5f:ac:ac brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::f884:37ff:fe5f:acac/64 scope link 
       valid_lft forever preferred_lft forever

节点 1

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s20u3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 68:1d:ef:06:2f:d3 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.104/24 brd 192.168.0.255 scope global noprefixroute enp0s20u3
       valid_lft forever preferred_lft forever
    inet6 fe80::6a1d:efff:fe06:2fd3/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:6c:7d:0e:a5 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:6cff:fe7d:ea5/64 scope link 
       valid_lft forever preferred_lft forever
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 9a:88:74:72:0a:45 brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::9888:74ff:fe72:a45/64 scope link 
       valid_lft forever preferred_lft forever
5: cni0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 0a:58:0a:f4:01:01 brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.1/24 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::50c0:eaff:fede:e09e/64 scope link 
       valid_lft forever preferred_lft forever
34: veth69dcb3f@if33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether be:eb:08:71:dc:8a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::bceb:8ff:fe71:dc8a/64 scope link 
       valid_lft forever preferred_lft forever
36: veth89519e8@if35: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 8e:c8:77:5c:b3:85 brd ff:ff:ff:ff:ff:ff link-netnsid 5
    inet6 fe80::8cc8:77ff:fe5c:b385/64 scope link 
       valid_lft forever preferred_lft forever
44: veth7e0c05a@if43: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 4a:36:e1:60:78:cc brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::4836:e1ff:fe60:78cc/64 scope link 
       valid_lft forever preferred_lft forever
46: veth944bd64@if45: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether f6:36:31:d7:df:11 brd ff:ff:ff:ff:ff:ff link-netnsid 6
    inet6 fe80::f436:31ff:fed7:df11/64 scope link 
       valid_lft forever preferred_lft forever
48: vethe018c5f@if47: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether ca:6b:b1:7d:15:63 brd ff:ff:ff:ff:ff:ff link-netnsid 7
    inet6 fe80::c86b:b1ff:fe7d:1563/64 scope link 
       valid_lft forever preferred_lft forever
50: vethbd59e85@if49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 3e:71:2e:ae:97:02 brd ff:ff:ff:ff:ff:ff link-netnsid 8
    inet6 fe80::3c71:2eff:feae:9702/64 scope link 
       valid_lft forever preferred_lft forever
52: veth6ba9feb@if51: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether e2:57:42:a0:ec:3b brd ff:ff:ff:ff:ff:ff link-netnsid 9
    inet6 fe80::e057:42ff:fea0:ec3b/64 scope link 
       valid_lft forever preferred_lft forever
58: veth33e51b9@if57: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 86:c6:09:5e:d0:b9 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::84c6:9ff:fe5e:d0b9/64 scope link 
       valid_lft forever preferred_lft forever

节点 2

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s20u3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 68:1d:ef:06:2d:9b brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.105/24 brd 192.168.0.255 scope global noprefixroute enp0s20u3
       valid_lft forever preferred_lft forever
    inet6 fe80::6a1d:efff:fe06:2d9b/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:2e:c6:f8:ae brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:2eff:fec6:f8ae/64 scope link 
       valid_lft forever preferred_lft forever
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default 
    link/ether 96:b0:30:d1:9f:ca brd ff:ff:ff:ff:ff:ff
    inet 10.244.2.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::94b0:30ff:fed1:9fca/64 scope link 
       valid_lft forever preferred_lft forever
14: veth2d26912@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether b6:f1:1c:ea:80:bb brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::b4f1:1cff:feea:80bb/64 scope link 
       valid_lft forever preferred_lft forever
16: veth600a995@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether e2:cb:50:d1:c8:3d brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::e0cb:50ff:fed1:c83d/64 scope link 
       valid_lft forever preferred_lft forever
18: vethba1dfe3@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether 7e:a8:37:60:0a:11 brd ff:ff:ff:ff:ff:ff link-netnsid 2
    inet6 fe80::7ca8:37ff:fe60:a11/64 scope link 
       valid_lft forever preferred_lft forever
20: vethc330c28@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether c6:19:7a:ec:f3:05 brd ff:ff:ff:ff:ff:ff link-netnsid 3
    inet6 fe80::c419:7aff:feec:f305/64 scope link 
       valid_lft forever preferred_lft forever
22: vethed97c29@if21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether a6:55:f4:ce:31:48 brd ff:ff:ff:ff:ff:ff link-netnsid 4
    inet6 fe80::a455:f4ff:fece:3148/64 scope link 
       valid_lft forever preferred_lft forever
24: vethd8a7c40@if23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default 
    link/ether fe:8e:10:2a:b7:c3 brd ff:ff:ff:ff:ff:ff link-netnsid 5
    inet6 fe80::fc8e:10ff:fe2a:b7c3/64 scope link 
       valid_lft forever preferred_lft forever

/etc/cni/net.d/10-flannel.conf(在所有主机上类似)

{
  "name": "cbr0",
  "type": "flannel",
  "delegate": {
    "isDefaultGateway": true
  }
}

kubeadm 配置

api:
  advertiseAddress: 192.168.0.103
  bindPort: 6443
  controlPlaneEndpoint: ""
apiServerExtraArgs:
  authorization-mode: Node,RBAC
apiVersion: kubeadm.k8s.io/v1alpha2
auditPolicy:
  logDir: /var/log/kubernetes/audit
  logMaxAge: 2
  path: ""
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
etcd:
  local:
    dataDir: /var/lib/etcd
    image: ""
imageRepository: k8s.gcr.io
kind: MasterConfiguration
kubeProxy:
  config:
    bindAddress: 0.0.0.0
    clientConnection:
      acceptContentTypes: ""
      burst: 10
      contentType: application/vnd.kubernetes.protobuf
      kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
      qps: 5
    clusterCIDR: 10.244.0.0/16
    configSyncPeriod: 15m0s
    conntrack:
      max: null
      maxPerCore: 32768
      min: 131072
      tcpCloseWaitTimeout: 1h0m0s
      tcpEstablishedTimeout: 24h0m0s
    enableProfiling: false
    healthzBindAddress: 0.0.0.0:10256
    hostnameOverride: ""
    iptables:
      masqueradeAll: false
      masqueradeBit: 14
      minSyncPeriod: 0s
      syncPeriod: 30s
    ipvs:
      excludeCIDRs: null
      minSyncPeriod: 0s
      scheduler: ""
      syncPeriod: 30s
    metricsBindAddress: 127.0.0.1:10249
    mode: ""
    nodePortAddresses: null
    oomScoreAdj: -999
    portRange: ""
    resourceContainer: /kube-proxy
    udpIdleTimeout: 250ms
kubeletConfiguration:
  baseConfig:
    address: 0.0.0.0
    authentication:
      anonymous:
        enabled: false
      webhook:
        cacheTTL: 2m0s
        enabled: true
      x509:
        clientCAFile: /etc/kubernetes/pki/ca.crt
    authorization:
      mode: Webhook
      webhook:
        cacheAuthorizedTTL: 5m0s
        cacheUnauthorizedTTL: 30s
    cgroupDriver: systemd
    cgroupsPerQOS: true
    clusterDNS:
    - 10.96.0.10
    clusterDomain: cluster.local
    containerLogMaxFiles: 5
    containerLogMaxSize: 10Mi
    contentType: application/vnd.kubernetes.protobuf
    cpuCFSQuota: true
    cpuManagerPolicy: none
    cpuManagerReconcilePeriod: 10s
    enableControllerAttachDetach: true
    enableDebuggingHandlers: true
    enforceNodeAllocatable:
    - pods
    eventBurst: 10
    eventRecordQPS: 5
    evictionHard:
      imagefs.available: 15%
      memory.available: 100Mi
      nodefs.available: 10%
      nodefs.inodesFree: 5%
    evictionPressureTransitionPeriod: 5m0s
    failSwapOn: true
    fileCheckFrequency: 20s
    hairpinMode: promiscuous-bridge
    healthzBindAddress: 127.0.0.1
    healthzPort: 10248
    httpCheckFrequency: 20s
    imageGCHighThresholdPercent: 85
    imageGCLowThresholdPercent: 80
    imageMinimumGCAge: 2m0s
    iptablesDropBit: 15
    iptablesMasqueradeBit: 14
    kubeAPIBurst: 10
    kubeAPIQPS: 5
    makeIPTablesUtilChains: true
    maxOpenFiles: 1000000
    maxPods: 110
    nodeStatusUpdateFrequency: 10s
    oomScoreAdj: -999
    podPidsLimit: -1
    port: 10250
    registryBurst: 10
    registryPullQPS: 5
    resolvConf: /etc/resolv.conf
    rotateCertificates: true
    runtimeRequestTimeout: 2m0s
    serializeImagePulls: true
    staticPodPath: /etc/kubernetes/manifests
    streamingConnectionIdleTimeout: 4h0m0s
    syncFrequency: 1m0s
    volumeStatsAggPeriod: 1m0s
kubernetesVersion: v1.11.0
networking:
  dnsDomain: cluster.local
  podSubnet: 10.244.0.0/16
  serviceSubnet: 10.96.0.0/12
nodeRegistration: {}
unifiedControlPlaneImage: ""

pods(截断)

NAMESPACE         NAME                                              READY     STATUS             RESTARTS   AGE       IP              NODE
cert-manager      cert-manager-65b7d47f7d-77nqj                     2/2       Running            4          9h        172.17.0.4      k8s-node-1.home
docker-registry   reg-server-76695985b6-bc49x                       0/1       CrashLoopBackOff   22         1h        172.17.0.2      k8s-node-2.home
docker-registry   registry-5df69cb5f7-2n2lv                         1/1       Running            0          1h        172.17.0.3      k8s-node-2.home
ingress-nginx     nginx-ingress-controller-699cdf846-w2dmj          1/1       Running            0          2d        172.17.0.8      k8s-node-1.home
kube-system       coredns-78fcdf6894-v2hrg                          0/1       CrashLoopBackOff   18         1h        172.17.0.6      k8s-node-2.home
kube-system       coredns-df995dbb4-j9pzw                           1/1       Running            0          2d        10.244.0.53     k8s-master.home
kube-system       kube-apiserver-k8s-master.home                   1/1       Running            0          1h        192.168.0.103   k8s-master.home
kube-system       kube-controller-manager-k8s-master.home          1/1       Running            0          56m       192.168.0.103   k8s-master.home
kube-system       kube-flannel-ds-6flvr                             1/1       Running            15         96d       192.168.0.103   k8s-master.home
kube-system       kube-proxy-mjxn9                                  1/1       Running            0          35m       192.168.0.103   k8s-master.home
kube-system       kube-scheduler-k8s-master.home                   1/1       Running            27         2d        192.168.0.103   k8s-master.home

注意:reg-server-76695985b6-bc49x 正在尝试通过入口访问 registry-5df69cb5f7-2n2lv。它没有这样做。如果两者都与 nginx ingress 在同一个节点上,那么它们的请求就会成功。

显然对于新的 kubernetes 1.11.x 版本,kube-system 中有新的配置映射,它指定了正在使用的 cgroup。一旦我编辑了它们以获得 systemd 的正确 cgroup(我是 运行 CentOS 节点),等待几分钟,并在节点上重新启动 pods,法兰绒子网再次被使用。

编辑
看来我也不得不 编辑 /var/lib/kubelet/kubeadm-flags.env 以包括:

KUBELET_KUBEADM_ARGS=--cgroup-driver=systemd --cni-bin-dir=/opt/cni/bin --cni-conf-dir=/etc/cni/net.d --network-plugin=cni