Google 托管 VM 模块卡在重启循环中

Google managed VM module stuck in reboot loop

我一直在尝试添加一个新的 App Engine Module,它使用托管 VM 而不是默认的 GAE 沙箱。目的是提供一个模块,我可以在其中 运行 更新版本的 SciPy 和 NumPy,我的面向用户的模块可以调用它。 我已经在本地成功构建 运行 我的 Docker images/containers,但是当我尝试部署到 [=32= 上的自定义版本时 运行 遇到了很多问题] 服务器。

以下是托管 VM 模块实例的串行控制台输出,由于似乎超出我控制的问题,它继续重新启动。

还有其他人 运行 参与其中吗?我在 configuration/deployment 过程中错过了什么吗?

FWIW:我已经使用 GAE 好几年了,甚至在 Google 期间也为它做出了贡献。我也有使用模块的经验,以及 Docker。托管 VM 的文档和工具目前似乎还很不成熟,我已经 运行 失去了动力去对抗它。我需要帮助。

Dec 09 00:52:41 vm_runtime_init: start 'pull_app'.
[   24.288054] docker0: port 1(veth8d67b7c) entered forwarding state
Dec  9 00:52:56 gae-mvm-vmv7-tsia kernel: [   24.288054] docker0: port 1(veth8d67b7c) entered forwarding state
Dec 09 00:52:57 Pulling GAE_FULL_APP_CONTAINER: appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
gcm-Heartbeat:1449622390000
Dec 09 00:53:13 ERROR: Timed out while trying to pull appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7 from registry!
===== Unexpected error during VM startup =====
=== Dump of VM runtime system logs follows ===
WARNING: HTTP 404 error while fetching metadata key gae_cloud_sql_instances. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_cloud_sql_proxy_image_name. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_extra_nginx_confs. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_redirect_appengine_googleapis_com. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_http_loadbalancer_enabled. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_loadbalancer. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_loadbalancer_ip. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_memcache_proxy. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_monitoring_image_name. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_use_cloud_monitoring. Will treat it as an empty string.
vm_runtime_init: Dec 09 00:52:40 Invoking all VM runtime components. /dev/fd/63
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: start 'allow_ssh'.
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: Done start 'allow_ssh'.
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: start 'unlocker'.
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: Done start 'unlocker'.
vm_runtime_init: Dec 09 00:52:40 vm_runtime_init: start 'fluentd_logger'.
vm_runtime_init: Dec 09 00:52:41 vm_runtime_init: Done start 'fluentd_logger'.
vm_runtime_init: Dec 09 00:52:41 vm_runtime_init: start 'pull_app'.
vm_runtime_init: Dec 09 00:52:57 Pulling GAE_FULL_APP_CONTAINER: appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Dec 09 00:53:13 ERROR: Timed out while trying to pull appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7 from registry!
Dec 09 00:52:40 Invoking all VM runtime components. /dev/fd/63
Dec 09 00:52:40 vm_runtime_init: start 'allow_ssh'.
Dec 09 00:52:40 vm_runtime_init: Done start 'allow_ssh'.
Dec 09 00:52:40 vm_runtime_init: start 'unlocker'.
Dec 09 00:52:40 vm_runtime_init: Done start 'unlocker'.
Dec 09 00:52:40 vm_runtime_init: start 'fluentd_logger'.
8aa8a33b8daa451d5595b951aeecad772a23d65b6592ac07cae6265cc74b6312
Dec 09 00:52:41 vm_runtime_init: Done start 'fluentd_logger'.
Dec 09 00:52:41 vm_runtime_init: start 'pull_app'.
Dec 09 00:52:57 Pulling GAE_FULL_APP_CONTAINER: appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Pulling metadata
16d49c9e1091: Pulling fs layer
16d49c9e1091: Download complete
54f405e77b26: Pulling metadata
54f405e77b26: Pulling fs layer
54f405e77b26: Download complete
36e2f6c710be: Pulling metadata
36e2f6c710be: Pulling fs layer
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Download complete
54f405e77b26: Download complete
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Download complete
54f405e77b26: Download complete
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Download complete
54f405e77b26: Download complete
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
Using default tag: latest
Pulling repository appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7
b626d012b369: Pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/
b626d012b369: Pulling dependent layers
643a001c5ee0: Download complete
559718b5f880: Download complete
8f8068a6a6b4: Download complete
16d49c9e1091: Download complete
54f405e77b26: Download complete
36e2f6c710be: Download complete
e8aed8091139: Pulling metadata
e8aed8091139: Pulling fs layer
e8aed8091139: Error downloading dependent layers
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, endpoint: https://appengine.gcr.io/v1/, Untar re-exec error: exit status 1: output: unexpected EOF
b626d012b369: Error pulling image (latest) from appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7, Untar re-exec error: exit status 1: output: unexpected EOF
Retrying docker pull.
CONTAINER ID        IMAGE                                    COMMAND                  CREATED             STATUS              PORTS               NAMES
8aa8a33b8daa        gcr.io/google_appengine/fluentd-logger   "/opt/google-fluentd/"   33 seconds ago      Up 32 seconds                           insane_panini
Container: 8aa8a33b8daa
========= rebooting. ========================

INIT: 
INIT: Sending processes the TERM signal


INIT: Sending processes the KILL signal

Dec  9 00:53:14 gae-mvm-vmv7-tsia init: Switching to runlevel: 1
gcm-StatusUpdate:TIME=1449622394000;STATUS=COMMAND_FAILED;INVOCATION_ID=0
[[36minfo[39;49m] Using makefile-style concurrent boot in runlevel 1.
Dec  9 00:53:15 gae-mvm-vmv7-tsia rpc.statd[1758]: Caught signal 15, un-registering and exiting
Dec  9 00:53:15 gae-mvm-vmv7-tsia google: shutdown script found in metadata.
[....] Stopping NFS common utilities: idmapd statd[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
Dec  9 00:53:15 gae-mvm-vmv7-tsia shutdownscript: Running shutdown script /var/run/google.shutdown.script
Dec  9 00:53:15 gae-mvm-vmv7-tsia rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"
[....] Stopping rpcbind daemon...[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
Stopping supervisor: supervisord.
udhcpd: Disabled. Edit /etc/default/udhcpd to enable it.
[....] Unmounting iscsi-backed filesystems: Unmounting all devices marked _netdev[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
Dec  9 00:53:16 gae-mvm-vmv7-tsia iscsid: iscsid shutting down.
[....] Unmounting iscsi-backed filesystems: Unmounting all devices marked _netdev[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[....] Disconnecting iSCSI targets:iscsiadm: No matching sessions found
[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[....] Stopping iSCSI initiator service:[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
Dec  9 00:53:16 gae-mvm-vmv7-tsia shutdownscript: Finished running shutdown script /var/run/google.shutdown.script
[....] Stopping Docker: docker[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[....] Stopping The Kubernetes container manager: kubelet[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[....] Stopping enhanced syslogd: rsyslogd[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0c.
[   54.500923] docker0: port 1(veth8d67b7c) entered disabled state
[   54.512521] docker0: port 1(veth8d67b7c) entered disabled state
[   54.522249] device veth8d67b7c left promiscuous mode
[   54.527554] docker0: port 1(veth8d67b7c) entered disabled state
Terminating on signal number 15
Traceback (most recent call last):
  File "/usr/share/google/google_daemon/manage_accounts.py", line 94, in <module>
    options.daemon, options.force, options.debug)
  File "/usr/share/google/google_daemon/manage_accounts.py", line 65, in Main
    manager_daemon.StartDaemon()
  File "/usr/share/google/google_daemon/accounts_manager_daemon.py", line 73, in StartDaemon
    self.accounts_manager.Main()
  File "/usr/share/google/google_daemon/accounts_manager.py", line 87, in Main
    writer.close()
IOError: [Errno 32] Broken pipe
[....] Asking all remaining processes to terminate...acpid: exiting

[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0cdone.
[....] All processes ended within 1 seconds....[?25l[?1c7[1G[[32m ok [39;49m8[?25h[?0cdone.
[[36minfo[3
INIT: Sending processes the TERM signal


INIT: Sending processes the KILL signal

sulogin: root account is locked, starting shell
root@gae-mvm-vmv7-tsia:~# 

编辑: 来自 shutdown.log 的附加信息如下。 docker logs 命令在我的任何代码或 Docker 文件中都没有被 运行 -- 我假设 Google 在它们的末端使用它的方式存在错误。

2015-12-08 17:08:22.194 Sending SIGUSR1 to fluentd to trigger a log flush.
2015-12-08 17:08:22.194 605e9f1ad747e63560fdc28a8c7f3c276d77255edd0a65381f7ad2f9f8eafd2a
2015-12-08 17:08:22.194 ---------------------------------------------------------------------
2015-12-08 17:08:22.194 ---------------App was unhealthy, grabbing debug logs----------------
2015-12-08 17:08:22.194 --------------------------App stdout/stderr--------------------------
2015-12-08 17:08:22.194 /usr/share/vm_runtime/vm_shutdown.sh: line 22: /var/run/app.cid: No such file or directory
2015-12-08 17:08:22.194 docker: "logs" requires 1 argument.
2015-12-08 17:08:22.194 See 'docker logs --help'.
2015-12-08 17:08:22.194 
2015-12-08 17:08:22.194 Usage:  docker logs [OPTIONS] CONTAINER
2015-12-08 17:08:22.194 
2015-12-08 17:08:22.194 Fetch the logs of a container
2015-12-08 17:08:22.194 ---------------------------------------------------------------------
2015-12-08 17:08:22.194 --------------------------Tail of app logs---------------------------
2015-12-08 17:08:22.194 tail: cannot open `/var/log/app_engine/app/app.0.log.json' for reading: No such file or directory
2015-12-08 17:08:22.194 ---------------------------------------------------------------------

我不知道答案,但我有一些想法。

缺少图像?

Dec 09 00:53:13 ERROR: Timed out while trying to pull appengine.gcr.io/389129677035831115/jt-calc.mvm.vmv7 from registry!

表示从注册表获取 docker 图像时出现问题。这听起来不像是 404 未找到错误。不过,我会尝试以下操作:

确保你可以pull the image from your development machine using gcloud docker pull. If not, push it to the registry

使用 App Engine 基础映像?

WARNING: HTTP 404 error while fetching metadata key gae_cloud_sql_instances. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_cloud_sql_proxy_image_name. Will treat it as an empty string.
WARNING: HTTP 404 error while fetching metadata key gae_extra_nginx_confs. Will treat it as an empty string.

这听起来像是本地元数据服务器未 运行 或未正确配置。我的猜测是这意味着您的自定义 docker 图像没有使用标准基本图像之一,特别是 Python base image。尝试更新您的 Dockerfile 以使用标准 python 基础映像。

编辑

这个答案大部分是正确的,但是根据 curl 推断图像不存在是有缺陷的。 python-compat 应该适合你,但是 python 也是一个有效的图像,从 运行 docker pull gcr.io/google_appengine/python:

可以看出
$ docker pull gcr.io/google_appengine/python
Pulling repository gcr.io/google_appengine/python
ac7db0912786: Download complete 
643a001c5ee0: Download complete 
559718b5f880: Download complete 
8f8068a6a6b4: Download complete 
16d49c9e1091: Download complete 
54f405e77b26: Download complete 
36e2f6c710be: Download complete 
e8aed8091139: Download complete 
8f0415d8e4e9: Download complete 
15ed20635873: Download complete 
6d70c8850a43: Download complete 
93ae290c32a1: Download complete 
7f766358fa71: Download complete 
7f4a74c30dc4: Download complete 
b51802c69e61: Download complete 
Status: Downloaded newer image for gcr.io/google_appengine/python:latest

正在与贡献者讨论 jonparrott over at the github repo for the gcr.io/google_appengine/python docker image, the relation of the various python Managed VM / Custom Runtime docker images was clarified in a comment

所以,我认为你在这里看到的问题,假设 VM 无法变得健康,可能与你从中获取的图像、你的 Dockerfile、你的应用程序代码或基础设施有关。前三个比最后一个更有可能,但这并非不可想象。似乎“Untar re-exec error: exit status 1: output: unexpected EOF”错误是串行控制台输出期间问题的第一个外在表现。

这可能值得在 Google Cloud Platform Public Issue Tracker 的问题报告中加入一些信息,例如您的 Dockerfile、您的应用程序代码(如果需要)、时间范围(如果它只是暂时发生)等等。


原答案

如果您检查容器注册表,您尝试获取的图像 FROM 不存在:

curl -v -X HEAD https://gcr.io/google_appengine/python
* Hostname was NOT found in DNS cache
*   Trying 74.125.193.82...
* Connected to gcr.io (74.125.193.82) port 443 (#0)
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server key exchange (12):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using ECDHE-RSA-AES128-GCM-SHA256
* Server certificate:
*        subject: C=US; ST=California; L=Mountain View; O=Google Inc; CN=*.googlecode.com
*        start date: 2015-12-02 14:40:36 GMT
*        expire date: 2016-03-01 00:00:00 GMT
*        subjectAltName: gcr.io matched
*        issuer: C=US; O=Google Inc; CN=Google Internet Authority G2
*        SSL certificate verify ok.
> HEAD /google_appengine/python HTTP/1.1
> User-Agent: curl/7.35.0
> Host: gcr.io
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< Date: Thu, 10 Dec 2015 19:14:52 GMT
< Content-Type: text/html; charset=UTF-8
* Server Docker Registry is not blacklisted
< Server: Docker Registry
< Content-Length: 1584
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
< Alternate-Protocol: 443:quic,p=0
< Alt-Svc: clear
< 

python 运行时实际存在的容器注册表映像是:

gcr.io/google_appengine/python-compat

这应该是解决部署失败的关键。