Cassandra 进程被 OS 杀死

Cassandra Process Killed by OS

我正在使用 apache cassandra 服务器。在随机时间后,我的 cassandra 服务停止。当我尝试使用 'service cassandra status' 使用 centOS7 检查其状态时,它显示了以下日志

[centos@ip-172-31-24-101 routes]$ service cassandra status

cassandra.service - LSB: distributed storage system for structured data
   Loaded: loaded (/etc/rc.d/init.d/cassandra; bad; vendor preset: disabled)
   Active: failed (Result: exit-code) since Mon 2018-12-31 10:26:13 UTC; 34min ago
     Docs: man:systemd-sysv-generator(8)
Main PID: 2078 (code=killed, signal=KILL)

Dec 31 05:12:46 ip-172-31-24-101.ap-south-1.compute.internal su[781]: (to cassandra) root on none

Dec 31 05:12:49 ip-172-31-24-101.ap-south-1.compute.internal cassandra[761]: Starting Cassandra: OK

Dec 31 05:12:49 ip-172-31-24-101.ap-south-1.compute.internal systemd[1]: Started LSB: distributed storage system for structured data.

Dec 31 10:25:46 ip-172-31-24-101.ap-south-1.compute.internal systemd[1]: cassandra.service: main process exited, code=killed, s...KILL

Dec 31 10:25:47 ip-172-31-24-101.ap-south-1.compute.internal su[15760]: (to cassandra) root on none

Dec 31 10:25:47 ip-172-31-24-101.ap-south-1.compute.internal cassandra[15746]: Shutdown Cassandra: bash: line 0: kill: (2078) - ...ess

Dec 31 10:26:13 ip-172-31-24-101.ap-south-1.compute.internal cassandra[15746]: ERROR: could not stop cassandra

Dec 31 10:26:13 ip-172-31-24-101.ap-south-1.compute.internal systemd[1]: cassandra.service: control process exited, code=exited...us=1

Dec 31 10:26:13 ip-172-31-24-101.ap-south-1.compute.internal systemd[1]: Unit cassandra.service entered failed state.

Dec 31 10:26:13 ip-172-31-24-101.ap-south-1.compute.internal systemd[1]: cassandra.service failed."

我如何找出 Cassandra 的问题所在?为什么它崩溃了?

我建议查看 Cassandra 进程的 system.log,因为它应该会指出问题所在。取决于您是从软件包还是 tarball 安装 C* 将取决于它所在的位置。软件包安装的默认值是 /var/log/cassandra,对于 tarball,我认为它是 installation_directory/log/cassandra(对此不是肯定的)。

被 SIGKILL 杀死的进程通常是 Linux 的 "OOM Killer" 的结果——它会在 运行 内存不足时杀死进程(参见,例如, https://unix.stackexchange.com/questions/136291/will-linux-start-killing-my-processes-without-asking-me-if-memory-gets-short/136294 有关 OOM 杀手的更多详细信息。

这可能表明您为 Cassandra 提供了太多内存(对于堆和堆外),没有足够的交换 space,或两者兼而有之。如果是 OOM 杀手杀死了你 r Cassandra,你应该能够在通常的地方找到日志消息(dmesg、/var/log/messages、journalctl,取决于你的发行版)。类似于以下内容的消息:

[   54.125380] Out of memory: Kill process 8320 (cassandra) score 324 or sacrifice child
[   54.125382] Killed process 8320 (cassandra) total-vm:1309660kB, anon-rss:1287796kB, file-rss:76kB