尝试创建 NSQ PetSet,pods 在容器启动后不久继续终止
Trying to create NSQ PetSet, pods keep terminating shortly after container launches
这里是完整的 yaml 文件(没有嵌入问题中,因为它相当长,而且很多重要的部分都包含在下面的 describe
中):
https://gist.github.com/sporkmonger/46a820f9a1ed8a73d89a319dffb24608
使用我在此处创建的 public 容器映像:sporkmonger/nsq-k8s:0.3.8
Container 与官方 NSQ 镜像相同,但使用 Debian Jessie 而不是 Alpine/musl 来解决 DNS 问题,这些问题往往是 Alpine-on-Kubernetes 的问题。
这是我描述其中一个 pods 时发生的情况:
❯ kubectl describe pod nsqd-0
Name: nsqd-0
Namespace: default
Node: minikube/192.168.99.100
Start Time: Sun, 04 Dec 2016 20:58:06 -0800
Labels: app=nsq
Status: Terminating (expires Sun, 04 Dec 2016 21:02:31 -0800)
Termination Grace Period: 60s
IP: 172.17.0.8
Controllers: PetSet/nsqd
Containers:
nsqd:
Container ID: docker://381e4a1313e4e13a63b8a17004d79a6e828a8bc1c9e20419b319f8a9757f266b
Image: sporkmonger/nsq-k8s:0.3.8
Image ID: docker://sha256:01691a91cee3e1a6992b33a10e99baa57c5b8ce7b765849540a830f0b554e707
Ports: 4150/TCP, 4151/TCP
Command:
/bin/sh
-c
Args:
/usr/local/bin/nsqd
-data-path
/data
-broadcast-address
$(hostname -f)
-lookupd-tcp-address
nsqlookupd-0.nsqlookupd.default.svc.cluster.local:4160
-lookupd-tcp-address
nsqlookupd-1.nsqlookupd.default.svc.cluster.local:4160
-lookupd-tcp-address
nsqlookupd-2.nsqlookupd.default.svc.cluster.local:4160
State: Running
Started: Sun, 04 Dec 2016 20:58:11 -0800
Ready: True
Restart Count: 0
Liveness: http-get http://:http/ping delay=5s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/ping delay=1s timeout=1s period=10s #success=1 #failure=3
Volume Mounts:
/data from datadir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-k6ufj (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: datadir-nsqd-0
ReadOnly: false
default-token-k6ufj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-k6ufj
QoS Class: BestEffort
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
4m 4m 1 {default-scheduler } Normal Scheduled Successfully assigned nsqd-0 to minikube
4m 4m 1 {kubelet minikube} spec.containers{nsqd} Normal Pulling pulling image "sporkmonger/nsq-k8s:0.3.8"
4m 4m 1 {kubelet minikube} spec.containers{nsqd} Normal Pulled Successfully pulled image "sporkmonger/nsq-k8s:0.3.8"
4m 4m 1 {kubelet minikube} spec.containers{nsqd} Normal Created Created container with docker id 381e4a1313e4; Security:[seccomp=unconfined]
4m 4m 1 {kubelet minikube} spec.containers{nsqd} Normal Started Started container with docker id 381e4a1313e4
0s 0s 1 {kubelet minikube} spec.containers{nsqd} Normal Killing Killing container with docker id 381e4a1313e4: Need to kill pod.
30秒左右集群比较有代表性的手表activity:
❯ kubectl get pods -w
NAME READY STATUS RESTARTS AGE
nsqadmin-0 1/1 Running 3 33m
nsqadmin-1 1/1 Running 0 32m
nsqd-0 1/1 Running 0 6m
nsqd-1 1/1 Running 0 4m
nsqd-2 1/1 Terminating 0 1m
nsqd-3 1/1 Running 0 30s
nsqlookupd-0 1/1 Running 0 30s
NAME READY STATUS RESTARTS AGE
nsqlookupd-1 0/1 Pending 0 0s
nsqlookupd-1 0/1 Pending 0 0s
nsqlookupd-1 0/1 ContainerCreating 0 0s
nsqlookupd-1 0/1 Running 0 4s
nsqlookupd-1 1/1 Running 0 8s
nsqlookupd-2 0/1 Pending 0 0s
nsqlookupd-2 0/1 Pending 0 0s
nsqlookupd-2 0/1 ContainerCreating 0 0s
nsqlookupd-2 0/1 Terminating 0 0s
nsqd-2 0/1 Terminating 0 2m
nsqd-2 0/1 Terminating 0 2m
nsqd-2 0/1 Terminating 0 2m
nsqlookupd-2 0/1 Terminating 0 4s
nsqlookupd-2 0/1 Terminating 0 5s
nsqlookupd-2 0/1 Terminating 0 5s
nsqlookupd-2 0/1 Terminating 0 5s
nsqlookupd-1 1/1 Terminating 0 29s
nsqlookupd-1 0/1 Terminating 0 30s
nsqlookupd-1 0/1 Terminating 0 30s
nsqlookupd-1 0/1 Terminating 0 30s
nsqlookupd-0 1/1 Terminating 0 1m
nsqd-2 0/1 Pending 0 0s
nsqd-2 0/1 Pending 0 0s
nsqd-2 0/1 ContainerCreating 0 0s
nsqlookupd-0 0/1 Terminating 0 1m
nsqlookupd-0 0/1 Terminating 0 1m
nsqlookupd-0 0/1 Terminating 0 1m
nsqlookupd-0 0/1 Pending 0 0s
nsqlookupd-0 0/1 Pending 0 0s
nsqlookupd-0 0/1 ContainerCreating 0 0s
nsqd-2 0/1 Running 0 4s
nsqlookupd-0 0/1 Running 0 4s
nsqd-2 1/1 Running 0 6s
nsqlookupd-0 1/1 Running 0 10s
nsqlookupd-0 1/1 Terminating 0 10s
nsqlookupd-0 0/1 Terminating 0 11s
nsqlookupd-0 0/1 Terminating 0 11s
nsqlookupd-0 0/1 Terminating 0 11s
nsqd-2 1/1 Terminating 0 12s
nsqlookupd-0 0/1 Pending 0 0s
nsqlookupd-0 0/1 Pending 0 0s
nsqlookupd-0 0/1 ContainerCreating 0 0s
nsqlookupd-0 0/1 Running 0 3s
nsqlookupd-0 1/1 Running 0 10s
典型容器日志:
❯ kubectl logs nsqd-0
[nsqd] 2016/12/05 05:21:34.666963 nsqd v0.3.8 (built w/go1.6.2)
[nsqd] 2016/12/05 05:21:34.667170 ID: 794
[nsqd] 2016/12/05 05:21:34.667200 NSQ: persisting topic/channel metadata to nsqd.794.dat
[nsqd] 2016/12/05 05:21:34.669232 TCP: listening on [::]:4150
[nsqd] 2016/12/05 05:21:34.669284 HTTP: listening on [::]:4151
[nsqd] 2016/12/05 05:21:35.896901 200 GET /ping (172.17.0.1:51322) 1.511µs
[nsqd] 2016/12/05 05:21:40.290550 200 GET /ping (172.17.0.1:51392) 2.167µs
[nsqd] 2016/12/05 05:21:40.304599 200 GET /ping (172.17.0.1:51394) 1.856µs
[nsqd] 2016/12/05 05:21:50.289018 200 GET /ping (172.17.0.1:51452) 1.865µs
[nsqd] 2016/12/05 05:21:50.299567 200 GET /ping (172.17.0.1:51454) 1.951µs
[nsqd] 2016/12/05 05:22:00.296685 200 GET /ping (172.17.0.1:51548) 2.029µs
[nsqd] 2016/12/05 05:22:00.300842 200 GET /ping (172.17.0.1:51550) 1.464µs
[nsqd] 2016/12/05 05:22:10.295596 200 GET /ping (172.17.0.1:51698) 2.206µs
关于 Kubernetes 为何不断杀死这些 pods,我完全摸不着头脑。容器本身似乎没有行为不端,而 kubernetes 本身似乎正在终止这里...
想通了。
我的服务都有相同的选择器。每个服务都匹配所有 pods,导致 Kubernetes 认为每个 运行 一次都匹配太多,所以它随机杀死了 "extras"。
这里是完整的 yaml 文件(没有嵌入问题中,因为它相当长,而且很多重要的部分都包含在下面的 describe
中):
https://gist.github.com/sporkmonger/46a820f9a1ed8a73d89a319dffb24608
使用我在此处创建的 public 容器映像:sporkmonger/nsq-k8s:0.3.8
Container 与官方 NSQ 镜像相同,但使用 Debian Jessie 而不是 Alpine/musl 来解决 DNS 问题,这些问题往往是 Alpine-on-Kubernetes 的问题。
这是我描述其中一个 pods 时发生的情况:
❯ kubectl describe pod nsqd-0
Name: nsqd-0
Namespace: default
Node: minikube/192.168.99.100
Start Time: Sun, 04 Dec 2016 20:58:06 -0800
Labels: app=nsq
Status: Terminating (expires Sun, 04 Dec 2016 21:02:31 -0800)
Termination Grace Period: 60s
IP: 172.17.0.8
Controllers: PetSet/nsqd
Containers:
nsqd:
Container ID: docker://381e4a1313e4e13a63b8a17004d79a6e828a8bc1c9e20419b319f8a9757f266b
Image: sporkmonger/nsq-k8s:0.3.8
Image ID: docker://sha256:01691a91cee3e1a6992b33a10e99baa57c5b8ce7b765849540a830f0b554e707
Ports: 4150/TCP, 4151/TCP
Command:
/bin/sh
-c
Args:
/usr/local/bin/nsqd
-data-path
/data
-broadcast-address
$(hostname -f)
-lookupd-tcp-address
nsqlookupd-0.nsqlookupd.default.svc.cluster.local:4160
-lookupd-tcp-address
nsqlookupd-1.nsqlookupd.default.svc.cluster.local:4160
-lookupd-tcp-address
nsqlookupd-2.nsqlookupd.default.svc.cluster.local:4160
State: Running
Started: Sun, 04 Dec 2016 20:58:11 -0800
Ready: True
Restart Count: 0
Liveness: http-get http://:http/ping delay=5s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/ping delay=1s timeout=1s period=10s #success=1 #failure=3
Volume Mounts:
/data from datadir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-k6ufj (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: datadir-nsqd-0
ReadOnly: false
default-token-k6ufj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-k6ufj
QoS Class: BestEffort
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
4m 4m 1 {default-scheduler } Normal Scheduled Successfully assigned nsqd-0 to minikube
4m 4m 1 {kubelet minikube} spec.containers{nsqd} Normal Pulling pulling image "sporkmonger/nsq-k8s:0.3.8"
4m 4m 1 {kubelet minikube} spec.containers{nsqd} Normal Pulled Successfully pulled image "sporkmonger/nsq-k8s:0.3.8"
4m 4m 1 {kubelet minikube} spec.containers{nsqd} Normal Created Created container with docker id 381e4a1313e4; Security:[seccomp=unconfined]
4m 4m 1 {kubelet minikube} spec.containers{nsqd} Normal Started Started container with docker id 381e4a1313e4
0s 0s 1 {kubelet minikube} spec.containers{nsqd} Normal Killing Killing container with docker id 381e4a1313e4: Need to kill pod.
30秒左右集群比较有代表性的手表activity:
❯ kubectl get pods -w
NAME READY STATUS RESTARTS AGE
nsqadmin-0 1/1 Running 3 33m
nsqadmin-1 1/1 Running 0 32m
nsqd-0 1/1 Running 0 6m
nsqd-1 1/1 Running 0 4m
nsqd-2 1/1 Terminating 0 1m
nsqd-3 1/1 Running 0 30s
nsqlookupd-0 1/1 Running 0 30s
NAME READY STATUS RESTARTS AGE
nsqlookupd-1 0/1 Pending 0 0s
nsqlookupd-1 0/1 Pending 0 0s
nsqlookupd-1 0/1 ContainerCreating 0 0s
nsqlookupd-1 0/1 Running 0 4s
nsqlookupd-1 1/1 Running 0 8s
nsqlookupd-2 0/1 Pending 0 0s
nsqlookupd-2 0/1 Pending 0 0s
nsqlookupd-2 0/1 ContainerCreating 0 0s
nsqlookupd-2 0/1 Terminating 0 0s
nsqd-2 0/1 Terminating 0 2m
nsqd-2 0/1 Terminating 0 2m
nsqd-2 0/1 Terminating 0 2m
nsqlookupd-2 0/1 Terminating 0 4s
nsqlookupd-2 0/1 Terminating 0 5s
nsqlookupd-2 0/1 Terminating 0 5s
nsqlookupd-2 0/1 Terminating 0 5s
nsqlookupd-1 1/1 Terminating 0 29s
nsqlookupd-1 0/1 Terminating 0 30s
nsqlookupd-1 0/1 Terminating 0 30s
nsqlookupd-1 0/1 Terminating 0 30s
nsqlookupd-0 1/1 Terminating 0 1m
nsqd-2 0/1 Pending 0 0s
nsqd-2 0/1 Pending 0 0s
nsqd-2 0/1 ContainerCreating 0 0s
nsqlookupd-0 0/1 Terminating 0 1m
nsqlookupd-0 0/1 Terminating 0 1m
nsqlookupd-0 0/1 Terminating 0 1m
nsqlookupd-0 0/1 Pending 0 0s
nsqlookupd-0 0/1 Pending 0 0s
nsqlookupd-0 0/1 ContainerCreating 0 0s
nsqd-2 0/1 Running 0 4s
nsqlookupd-0 0/1 Running 0 4s
nsqd-2 1/1 Running 0 6s
nsqlookupd-0 1/1 Running 0 10s
nsqlookupd-0 1/1 Terminating 0 10s
nsqlookupd-0 0/1 Terminating 0 11s
nsqlookupd-0 0/1 Terminating 0 11s
nsqlookupd-0 0/1 Terminating 0 11s
nsqd-2 1/1 Terminating 0 12s
nsqlookupd-0 0/1 Pending 0 0s
nsqlookupd-0 0/1 Pending 0 0s
nsqlookupd-0 0/1 ContainerCreating 0 0s
nsqlookupd-0 0/1 Running 0 3s
nsqlookupd-0 1/1 Running 0 10s
典型容器日志:
❯ kubectl logs nsqd-0
[nsqd] 2016/12/05 05:21:34.666963 nsqd v0.3.8 (built w/go1.6.2)
[nsqd] 2016/12/05 05:21:34.667170 ID: 794
[nsqd] 2016/12/05 05:21:34.667200 NSQ: persisting topic/channel metadata to nsqd.794.dat
[nsqd] 2016/12/05 05:21:34.669232 TCP: listening on [::]:4150
[nsqd] 2016/12/05 05:21:34.669284 HTTP: listening on [::]:4151
[nsqd] 2016/12/05 05:21:35.896901 200 GET /ping (172.17.0.1:51322) 1.511µs
[nsqd] 2016/12/05 05:21:40.290550 200 GET /ping (172.17.0.1:51392) 2.167µs
[nsqd] 2016/12/05 05:21:40.304599 200 GET /ping (172.17.0.1:51394) 1.856µs
[nsqd] 2016/12/05 05:21:50.289018 200 GET /ping (172.17.0.1:51452) 1.865µs
[nsqd] 2016/12/05 05:21:50.299567 200 GET /ping (172.17.0.1:51454) 1.951µs
[nsqd] 2016/12/05 05:22:00.296685 200 GET /ping (172.17.0.1:51548) 2.029µs
[nsqd] 2016/12/05 05:22:00.300842 200 GET /ping (172.17.0.1:51550) 1.464µs
[nsqd] 2016/12/05 05:22:10.295596 200 GET /ping (172.17.0.1:51698) 2.206µs
关于 Kubernetes 为何不断杀死这些 pods,我完全摸不着头脑。容器本身似乎没有行为不端,而 kubernetes 本身似乎正在终止这里...
想通了。
我的服务都有相同的选择器。每个服务都匹配所有 pods,导致 Kubernetes 认为每个 运行 一次都匹配太多,所以它随机杀死了 "extras"。