calico-node pods 在 gke 集群从 1.10.x 升级到 1.11.x 后不启动
calico-node pods don't start after gke cluster upgrade from 1.10.x to 1.11.x
我们已将 GKE 集群升级到 1。11.x 虽然该过程成功完成,但集群无法正常工作。有多个 pods 崩溃或保持 peding 并且它在 calico 网络上的所有点都不起作用:
calico-node-2hhfz 1/2 CrashLoopBackOff 5 6m
它的日志显示了这个信息:
kubectl -n kube-system logs -f calico-node-2hhfz calico-node
注意最后的错误(could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)
):
2018-12-04 11:22:39.617 [INFO][10] startup.go 252: Early log level set to info
2018-12-04 11:22:39.618 [INFO][10] startup.go 268: Using NODENAME environment for node name
2018-12-04 11:22:39.618 [INFO][10] startup.go 280: Determined node name: gke-apps-internas-apps-internas-4c-6r-ecf8b140-9p8x
2018-12-04 11:22:39.619 [INFO][10] startup.go 303: Checking datastore connection
2018-12-04 11:22:39.626 [INFO][10] startup.go 327: Datastore connection verified
2018-12-04 11:22:39.626 [INFO][10] startup.go 100: Datastore is ready
2018-12-04 11:22:39.632 [INFO][10] startup.go 1052: Running migration
2018-12-04 11:22:39.632 [INFO][10] migrate.go 866: Querying current v1 snapshot and converting to v3
2018-12-04 11:22:39.632 [INFO][10] migrate.go 875: handling FelixConfiguration (global) resource
2018-12-04 11:22:39.637 [INFO][10] migrate.go 875: handling ClusterInformation (global) resource
2018-12-04 11:22:39.637 [INFO][10] migrate.go 875: skipping FelixConfiguration (per-node) resources - not supported
2018-12-04 11:22:39.637 [INFO][10] migrate.go 875: handling BGPConfiguration (global) resource
2018-12-04 11:22:39.637 [INFO][10] migrate.go 600: Converting BGP config -> BGPConfiguration(default)
2018-12-04 11:22:39.644 [INFO][10] migrate.go 875: skipping Node resources - these do not need migrating
2018-12-04 11:22:39.644 [INFO][10] migrate.go 875: skipping BGPPeer (global) resources - these do not need migrating
2018-12-04 11:22:39.644 [INFO][10] migrate.go 875: handling BGPPeer (node) resources
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping HostEndpoint resources - not supported
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping IPPool resources - these do not need migrating
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping GlobalNetworkPolicy resources - these do not need migrating
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping Profile resources - these do not need migrating
2018-12-04 11:22:39.652 [INFO][10] migrate.go 875: skipping WorkloadEndpoint resources - these do not need migrating
2018-12-04 11:22:39.652 [INFO][10] migrate.go 875: data converted successfully
2018-12-04 11:22:39.652 [INFO][10] migrate.go 866: Storing v3 data
2018-12-04 11:22:39.652 [INFO][10] migrate.go 875: Storing resources in v3 format
2018-12-04 11:22:39.673 [INFO][10] migrate.go 1151: Failed to create resource Key=BGPConfiguration(default) error=resource does not exist: BGPConfiguration(default) with error: the server could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)
2018-12-04 11:22:39.673 [ERROR][10] migrate.go 884: Unable to store the v3 resources
2018-12-04 11:22:39.673 [INFO][10] migrate.go 875: cause: resource does not exist: BGPConfiguration(default) with error: the server could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)
2018-12-04 11:22:39.673 [ERROR][10] startup.go 107: Unable to ensure datastore is migrated. error=Migration failed: error storing converted data: resource does not exist: BGPConfiguration(default) with error: the server could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)
2018-12-04 11:22:39.673 [WARNING][10] startup.go 1066: Terminating
Calico node failed to start
知道我们如何修复集群吗?
由于缺少 BGPConfiguration 的自定义资源定义,导致 calico pods 无法启动的 GKE 升级过程出现问题。
将对应的crd应用到集群后问题解决:
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: bgpconfigurations.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: BGPConfiguration
plural: bgpconfigurations
singular: bgpconfiguration
我们已将 GKE 集群升级到 1。11.x 虽然该过程成功完成,但集群无法正常工作。有多个 pods 崩溃或保持 peding 并且它在 calico 网络上的所有点都不起作用:
calico-node-2hhfz 1/2 CrashLoopBackOff 5 6m
它的日志显示了这个信息:
kubectl -n kube-system logs -f calico-node-2hhfz calico-node
注意最后的错误(could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)
):
2018-12-04 11:22:39.617 [INFO][10] startup.go 252: Early log level set to info
2018-12-04 11:22:39.618 [INFO][10] startup.go 268: Using NODENAME environment for node name
2018-12-04 11:22:39.618 [INFO][10] startup.go 280: Determined node name: gke-apps-internas-apps-internas-4c-6r-ecf8b140-9p8x
2018-12-04 11:22:39.619 [INFO][10] startup.go 303: Checking datastore connection
2018-12-04 11:22:39.626 [INFO][10] startup.go 327: Datastore connection verified
2018-12-04 11:22:39.626 [INFO][10] startup.go 100: Datastore is ready
2018-12-04 11:22:39.632 [INFO][10] startup.go 1052: Running migration
2018-12-04 11:22:39.632 [INFO][10] migrate.go 866: Querying current v1 snapshot and converting to v3
2018-12-04 11:22:39.632 [INFO][10] migrate.go 875: handling FelixConfiguration (global) resource
2018-12-04 11:22:39.637 [INFO][10] migrate.go 875: handling ClusterInformation (global) resource
2018-12-04 11:22:39.637 [INFO][10] migrate.go 875: skipping FelixConfiguration (per-node) resources - not supported
2018-12-04 11:22:39.637 [INFO][10] migrate.go 875: handling BGPConfiguration (global) resource
2018-12-04 11:22:39.637 [INFO][10] migrate.go 600: Converting BGP config -> BGPConfiguration(default)
2018-12-04 11:22:39.644 [INFO][10] migrate.go 875: skipping Node resources - these do not need migrating
2018-12-04 11:22:39.644 [INFO][10] migrate.go 875: skipping BGPPeer (global) resources - these do not need migrating
2018-12-04 11:22:39.644 [INFO][10] migrate.go 875: handling BGPPeer (node) resources
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping HostEndpoint resources - not supported
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping IPPool resources - these do not need migrating
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping GlobalNetworkPolicy resources - these do not need migrating
2018-12-04 11:22:39.651 [INFO][10] migrate.go 875: skipping Profile resources - these do not need migrating
2018-12-04 11:22:39.652 [INFO][10] migrate.go 875: skipping WorkloadEndpoint resources - these do not need migrating
2018-12-04 11:22:39.652 [INFO][10] migrate.go 875: data converted successfully
2018-12-04 11:22:39.652 [INFO][10] migrate.go 866: Storing v3 data
2018-12-04 11:22:39.652 [INFO][10] migrate.go 875: Storing resources in v3 format
2018-12-04 11:22:39.673 [INFO][10] migrate.go 1151: Failed to create resource Key=BGPConfiguration(default) error=resource does not exist: BGPConfiguration(default) with error: the server could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)
2018-12-04 11:22:39.673 [ERROR][10] migrate.go 884: Unable to store the v3 resources
2018-12-04 11:22:39.673 [INFO][10] migrate.go 875: cause: resource does not exist: BGPConfiguration(default) with error: the server could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)
2018-12-04 11:22:39.673 [ERROR][10] startup.go 107: Unable to ensure datastore is migrated. error=Migration failed: error storing converted data: resource does not exist: BGPConfiguration(default) with error: the server could not find the requested resource (post BGPConfigurations.crd.projectcalico.org)
2018-12-04 11:22:39.673 [WARNING][10] startup.go 1066: Terminating
Calico node failed to start
知道我们如何修复集群吗?
由于缺少 BGPConfiguration 的自定义资源定义,导致 calico pods 无法启动的 GKE 升级过程出现问题。
将对应的crd应用到集群后问题解决:
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: bgpconfigurations.crd.projectcalico.org
spec:
scope: Cluster
group: crd.projectcalico.org
version: v1
names:
kind: BGPConfiguration
plural: bgpconfigurations
singular: bgpconfiguration