Openshift 命令以非零退出代码终止：在 Docker 容器中执行错误：137

Question

我是运行 openshift 上基于 opencpu 的映像，每次 pod 启动时，仅几秒钟后，它就会崩溃并显示错误：

command terminated with non-zero exit code: Error executing in Docker Container: 137

事件选项卡仅显示以下三个事件，终端日志也没有显示任何内容。

Back-off restarting the failed container
Pod sandbox changed, it will be killed and re-created.
Killing container with id docker://opencpu-test-temp:Need to kill Pod

我真的不知道为什么容器每隔几秒就重启一次。此图像在本地运行良好。

有人能告诉我如何调试这个问题吗？

Answer 1

错误 137 通常与 docker 上下文中的内存相关。

实际错误来自 docker 容器中隔离的进程。这意味着无法使用 SIGKILL 终止进程。 Source

来自 bobcares.com:

Error 137 in Docker denotes that the container was ‘KILL’ed by ‘oom-killer’ (Out of Memory). This happens when there isn’t enough memory in the container for running the process.

‘OOM killer’ is a proactive process that jumps in to save the system when its memory level goes too low, by killing the resource-abusive processes to free up memory for the system.

尝试检查容器的内存配置？以及启动 pod 的主机上的可用内存？ opencpu 容器日志里没有什么吗？

检查图像内配置文件 /etc/opencpu/server.conf 中的设置 rlimit.as。这个限制是你的 opencpu 实例的 "per request" 内存限制（我知道你的问题是在启动时，所以这不太可能是这种情况）。

Openshift 命令以非零退出代码终止：在 Docker 容器中执行错误：137

Openshift command terminated with non-zero exit code: Error executing in Docker Container: 137

openshift

opencpu

docker