给定容器错误状态代码,在哪里可以找到更明确的错误?
Where to find more explicit errors given container error status codes?
我实际上是 运行 通过 Mesos
堆栈执行任务,它使用 Docker
容器。
有时,某些任务会失败。
以下是一些相关的 TaskStatus
消息和原因:
message: Container exited with status 1 - reason: REASON_COMMAND_EXECUTOR_FAILED
message: Container exited with status 42 - reason: REASON_COMMAND_EXECUTOR_FAILED
message: Container exited with status 137 - reason: REASON_COMMAND_EXECUTOR_FAILED
是否有 table 的对应关系将来自 TaskStatus
消息的容器错误状态代码与更明确的错误联系起来?
猜你想在mesos.proto
中复习enum Reason(复制如下):
enum Reason {
// TODO(jieyu): The default value when a caller doesn't check for
// presence is 0 and so ideally the 0 reason is not a valid one.
// Since this is not used anywhere, consider removing this reason.
REASON_COMMAND_EXECUTOR_FAILED = 0;
REASON_CONTAINER_LAUNCH_FAILED = 21;
REASON_CONTAINER_LIMITATION = 19;
REASON_CONTAINER_LIMITATION_DISK = 20;
REASON_CONTAINER_LIMITATION_MEMORY = 8;
REASON_CONTAINER_PREEMPTED = 17;
REASON_CONTAINER_UPDATE_FAILED = 22;
REASON_EXECUTOR_REGISTRATION_TIMEOUT = 23;
REASON_EXECUTOR_REREGISTRATION_TIMEOUT = 24;
REASON_EXECUTOR_TERMINATED = 1;
REASON_EXECUTOR_UNREGISTERED = 2;
REASON_FRAMEWORK_REMOVED = 3;
REASON_GC_ERROR = 4;
REASON_INVALID_FRAMEWORKID = 5;
REASON_INVALID_OFFERS = 6;
REASON_IO_SWITCHBOARD_EXITED = 27;
REASON_MASTER_DISCONNECTED = 7;
REASON_RECONCILIATION = 9;
REASON_RESOURCES_UNKNOWN = 18;
REASON_SLAVE_DISCONNECTED = 10;
REASON_SLAVE_REMOVED = 11;
REASON_SLAVE_RESTARTED = 12;
REASON_SLAVE_UNKNOWN = 13;
REASON_TASK_CHECK_STATUS_UPDATED = 28;
REASON_TASK_GROUP_INVALID = 25;
REASON_TASK_GROUP_UNAUTHORIZED = 26;
REASON_TASK_INVALID = 14;
REASON_TASK_UNAUTHORIZED = 15;
REASON_TASK_UNKNOWN = 16;
}
命令任务可能因多种原因而失败并设置正确的退出代码。例如 Docker 1.10 像这样设置退出状态代码 (from documentation and this answer):
The exit code from docker run gives information about why
the container failed to run or why it exited. When docker run exits
with a non-zero code, the exit codes follow the chroot standard, see
below:
125 if the error is with Docker daemon itself:
$ docker run --foo busybox; echo $?
# flag provided but not defined: --foo See 'docker run --help'.
126 if the contained command cannot be invoked:
$ docker run busybox /etc; echo $?
# docker: Error response from daemon: Container command '/etc' could not be invoked.
127 if the contained command cannot be found
$ docker run busybox foo; echo $?
# docker: Error response from daemon: Container command 'foo' not found or does not exist. 127 Exit code of contained command
otherwise
$ docker run busybox /bin/sh -c 'exit 3'; echo $?
# 3
可以找到另一个退出代码规则here
| Code | Meaning | Example | Comments |
|-------|--------------------------------|-------------------------|--------------------------------------------------------------------------------------------------------------|
| 1 | Catchall for general errors | let "var1 = 1/0" | Miscellaneous errors, such as "divide by zero" and other impermissible operations |
| 2 | Misuse of shell builtins | empty_function() {} | Missing keyword or command, or permission problem (and diff return code on a failed binary file comparison). |
| 126 | Command invoked cannot execute | /dev/null | Permission problem or command is not an executable |
| 127 | "command not found" | illegal_command | Possible problem with $PATH or a typo |
| 128 | Invalid argument to exit | exit 3.14159 | exit takes only integer args in the range 0 - 255 (see first footnote) |
| 128+n | Fatal error signal "n" | kill -9 $PPID of script | $? returns 137 (128 + 9) |
| 130 | Script terminated by Control-C | Ctl-C | Control-C is fatal error signal 2, (130 = 128 + 2, see above) |
| 255* | Exit status out of range | exit -1 | exit takes only integer args in the range 0 - 255 |
根据你的例子:
- 137 – Out Of Memory;
128 + 9 = 137 (9 coming from SIGKILL)
并且可能被转码为内存不足错误并终止。
- 1 – 命令以
1
退出。可能是由于无效配置、内部应用程序错误或无效输入。
- 42 –
Answer to the Ultimate Question of Life, the Universe, and Everything
如果您需要更多信息来解释状态代码,您可以查看 Mesos TaskStatus 更新中的 Message 字段,例如 Mesos 将有关 OOM 的信息放在那里。在 Mesos 日志中也可以找到相同的信息。要调试命令返回非零代码的原因,您可以检查存储在执行程序沙箱中的文件,尤其是 stderr/stdout 或命令特定日志。
我实际上是 运行 通过 Mesos
堆栈执行任务,它使用 Docker
容器。
有时,某些任务会失败。
以下是一些相关的 TaskStatus
消息和原因:
message: Container exited with status 1 - reason: REASON_COMMAND_EXECUTOR_FAILED
message: Container exited with status 42 - reason: REASON_COMMAND_EXECUTOR_FAILED
message: Container exited with status 137 - reason: REASON_COMMAND_EXECUTOR_FAILED
是否有 table 的对应关系将来自 TaskStatus
消息的容器错误状态代码与更明确的错误联系起来?
猜你想在mesos.proto
中复习enum Reason(复制如下):
enum Reason {
// TODO(jieyu): The default value when a caller doesn't check for
// presence is 0 and so ideally the 0 reason is not a valid one.
// Since this is not used anywhere, consider removing this reason.
REASON_COMMAND_EXECUTOR_FAILED = 0;
REASON_CONTAINER_LAUNCH_FAILED = 21;
REASON_CONTAINER_LIMITATION = 19;
REASON_CONTAINER_LIMITATION_DISK = 20;
REASON_CONTAINER_LIMITATION_MEMORY = 8;
REASON_CONTAINER_PREEMPTED = 17;
REASON_CONTAINER_UPDATE_FAILED = 22;
REASON_EXECUTOR_REGISTRATION_TIMEOUT = 23;
REASON_EXECUTOR_REREGISTRATION_TIMEOUT = 24;
REASON_EXECUTOR_TERMINATED = 1;
REASON_EXECUTOR_UNREGISTERED = 2;
REASON_FRAMEWORK_REMOVED = 3;
REASON_GC_ERROR = 4;
REASON_INVALID_FRAMEWORKID = 5;
REASON_INVALID_OFFERS = 6;
REASON_IO_SWITCHBOARD_EXITED = 27;
REASON_MASTER_DISCONNECTED = 7;
REASON_RECONCILIATION = 9;
REASON_RESOURCES_UNKNOWN = 18;
REASON_SLAVE_DISCONNECTED = 10;
REASON_SLAVE_REMOVED = 11;
REASON_SLAVE_RESTARTED = 12;
REASON_SLAVE_UNKNOWN = 13;
REASON_TASK_CHECK_STATUS_UPDATED = 28;
REASON_TASK_GROUP_INVALID = 25;
REASON_TASK_GROUP_UNAUTHORIZED = 26;
REASON_TASK_INVALID = 14;
REASON_TASK_UNAUTHORIZED = 15;
REASON_TASK_UNKNOWN = 16;
}
命令任务可能因多种原因而失败并设置正确的退出代码。例如 Docker 1.10 像这样设置退出状态代码 (from documentation and this answer):
The exit code from docker run gives information about why the container failed to run or why it exited. When docker run exits with a non-zero code, the exit codes follow the chroot standard, see below:
125 if the error is with Docker daemon itself:
$ docker run --foo busybox; echo $? # flag provided but not defined: --foo See 'docker run --help'.
126 if the contained command cannot be invoked:
$ docker run busybox /etc; echo $? # docker: Error response from daemon: Container command '/etc' could not be invoked.
127 if the contained command cannot be found
$ docker run busybox foo; echo $? # docker: Error response from daemon: Container command 'foo' not found or does not exist. 127 Exit code of contained command
otherwise
$ docker run busybox /bin/sh -c 'exit 3'; echo $? # 3
可以找到另一个退出代码规则here
| Code | Meaning | Example | Comments |
|-------|--------------------------------|-------------------------|--------------------------------------------------------------------------------------------------------------|
| 1 | Catchall for general errors | let "var1 = 1/0" | Miscellaneous errors, such as "divide by zero" and other impermissible operations |
| 2 | Misuse of shell builtins | empty_function() {} | Missing keyword or command, or permission problem (and diff return code on a failed binary file comparison). |
| 126 | Command invoked cannot execute | /dev/null | Permission problem or command is not an executable |
| 127 | "command not found" | illegal_command | Possible problem with $PATH or a typo |
| 128 | Invalid argument to exit | exit 3.14159 | exit takes only integer args in the range 0 - 255 (see first footnote) |
| 128+n | Fatal error signal "n" | kill -9 $PPID of script | $? returns 137 (128 + 9) |
| 130 | Script terminated by Control-C | Ctl-C | Control-C is fatal error signal 2, (130 = 128 + 2, see above) |
| 255* | Exit status out of range | exit -1 | exit takes only integer args in the range 0 - 255 |
根据你的例子:
- 137 – Out Of Memory;
128 + 9 = 137 (9 coming from SIGKILL)
并且可能被转码为内存不足错误并终止。 - 1 – 命令以
1
退出。可能是由于无效配置、内部应用程序错误或无效输入。 - 42 –
Answer to the Ultimate Question of Life, the Universe, and Everything
如果您需要更多信息来解释状态代码,您可以查看 Mesos TaskStatus 更新中的 Message 字段,例如 Mesos 将有关 OOM 的信息放在那里。在 Mesos 日志中也可以找到相同的信息。要调试命令返回非零代码的原因,您可以检查存储在执行程序沙箱中的文件,尤其是 stderr/stdout 或命令特定日志。