为什么 Spark standalone Worker node-1 在 RECEIVED SIGNAL 15: SIGTERM 后终止?
Why did Spark standalone Worker node-1 terminate after RECEIVED SIGNAL 15: SIGTERM?
注意:这个错误是在spark执行组件之前抛出的。
日志
工作节点 1:
17/05/18 23:12:52 INFO Worker: Successfully registered with master spark://spark-master-1.com:7077
17/05/18 23:58:41 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM
主节点:
17/05/18 23:12:52 INFO Master: Registering worker spark-worker-1com:56056 with 2 cores, 14.5 GB RAM
17/05/18 23:14:20 INFO Master: Registering worker spark-worker-2.com:53986 with 2 cores, 14.5 GB RAM
17/05/18 23:59:42 WARN Master: Removing spark-worker-1com-56056 because we got no heartbeat in 60 seconds
17/05/18 23:59:42 INFO Master: Removing spark-worker-2.com:56056
17/05/19 00:00:03 ERROR Master: RECEIVED SIGNAL 15: SIGTERM
工作节点 2:
17/05/18 23:14:20 INFO Worker: Successfully registered with master spark://spark-master-node-2.com:7077
17/05/18 23:59:40 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM
TL;DR 我认为有人明确调用了 kill
命令或 sbin/stop-worker.sh
.
"RECEIVED SIGNAL 15: SIGTERM" 由 shutdown hook 报告以记录类 UNIX 系统上的 TERM
、HUP
、INT
信号:
/** Register a signal handler to log signals on UNIX-like systems. */
def registerLogger(log: Logger): Unit = synchronized {
if (!loggerRegistered) {
Seq("TERM", "HUP", "INT").foreach { sig =>
SignalUtils.register(sig) {
log.error("RECEIVED SIGNAL " + sig)
false
}
}
loggerRegistered = true
}
}
在你的情况下,这意味着进程收到 SIGTERM 停止自身:
The SIGTERM signal is a generic signal used to cause program termination. Unlike SIGKILL, this signal can be blocked, handled, and ignored. It is the normal way to politely ask a program to terminate.
这就是当您执行 KILL
或使用 ./sbin/stop-master.sh
或 ./sbin/stop-worker.sh
shell 脚本时发送的内容,这些脚本依次调用 sbin/spark-daemon.sh
和 stop
命令 kills a JVM process for a master or a worker:
kill "$TARGET_ID" && rm -f "$pid"
注意:这个错误是在spark执行组件之前抛出的。
日志
工作节点 1:
17/05/18 23:12:52 INFO Worker: Successfully registered with master spark://spark-master-1.com:7077
17/05/18 23:58:41 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM
主节点:
17/05/18 23:12:52 INFO Master: Registering worker spark-worker-1com:56056 with 2 cores, 14.5 GB RAM
17/05/18 23:14:20 INFO Master: Registering worker spark-worker-2.com:53986 with 2 cores, 14.5 GB RAM
17/05/18 23:59:42 WARN Master: Removing spark-worker-1com-56056 because we got no heartbeat in 60 seconds
17/05/18 23:59:42 INFO Master: Removing spark-worker-2.com:56056
17/05/19 00:00:03 ERROR Master: RECEIVED SIGNAL 15: SIGTERM
工作节点 2:
17/05/18 23:14:20 INFO Worker: Successfully registered with master spark://spark-master-node-2.com:7077
17/05/18 23:59:40 ERROR Worker: RECEIVED SIGNAL 15: SIGTERM
TL;DR 我认为有人明确调用了 kill
命令或 sbin/stop-worker.sh
.
"RECEIVED SIGNAL 15: SIGTERM" 由 shutdown hook 报告以记录类 UNIX 系统上的 TERM
、HUP
、INT
信号:
/** Register a signal handler to log signals on UNIX-like systems. */
def registerLogger(log: Logger): Unit = synchronized {
if (!loggerRegistered) {
Seq("TERM", "HUP", "INT").foreach { sig =>
SignalUtils.register(sig) {
log.error("RECEIVED SIGNAL " + sig)
false
}
}
loggerRegistered = true
}
}
在你的情况下,这意味着进程收到 SIGTERM 停止自身:
The SIGTERM signal is a generic signal used to cause program termination. Unlike SIGKILL, this signal can be blocked, handled, and ignored. It is the normal way to politely ask a program to terminate.
这就是当您执行 KILL
或使用 ./sbin/stop-master.sh
或 ./sbin/stop-worker.sh
shell 脚本时发送的内容,这些脚本依次调用 sbin/spark-daemon.sh
和 stop
命令 kills a JVM process for a master or a worker:
kill "$TARGET_ID" && rm -f "$pid"