野蝇会杀死另一只野蝇吗?
Does wildfly kills another wildfly?
我们遇到了一个(至少在我们看来)奇怪的问题:
我们在同一台 linux 机器 (CentOS 6.6) 上安装了两个 Wildfly 8.1,它们运行不同版本的相同应用程序并列出不同的端口。
现在,我们突然发现,当启动其中一个时,另一个被杀死了。然后我们发现由于其他泄漏进程,可用内存量很低。当我们杀死它们时,两只 wildlfly 再次正常运行。
因为我不认为 linux 本身决定杀死另一个随机进程,我假设 JBoss 有某种机制通过杀死它认为不是的东西来释放内存不再需要或者它们都使用了(可能是错误的配置)资源,导致其中一个在无法获取资源时被杀死。
有没有人经历过类似的事情或知道那种机制?
很可能是 linux OOM Killer
。
您可以通过检查日志文件来验证其中一台服务器是否被它杀死:
grep -i kill /var/log/messages*
如果是这样,您应该会看到如下内容:
host kernel: Out of Memory: Killed process 2592
OOM killer 在确定要杀死哪个进程时使用以下算法:
The function select_bad_process() is responsible for choosing a process to kill. It decides by stepping through each running task and calculating how suitable it is for killing with the function badness(). The badness is calculated as follows, note that the square roots are integer approximations calculated with int_sqrt();
badness_for_task = total_vm_for_task / (sqrt(cpu_time_in_seconds) *
sqrt(sqrt(cpu_time_in_minutes)))
This has been chosen to select a process that is using a large amount of memory but is not that long lived. Processes which have been running a long time are unlikely to be the cause of memory shortage so this calculation is likely to select a process that uses a lot of memory but has not been running long.
可以通过读取/proc
中进程目录下的oom_score
文件,手动查看每个进程的badness
cat /proc/10292/oom_score
我们遇到了一个(至少在我们看来)奇怪的问题:
我们在同一台 linux 机器 (CentOS 6.6) 上安装了两个 Wildfly 8.1,它们运行不同版本的相同应用程序并列出不同的端口。
现在,我们突然发现,当启动其中一个时,另一个被杀死了。然后我们发现由于其他泄漏进程,可用内存量很低。当我们杀死它们时,两只 wildlfly 再次正常运行。
因为我不认为 linux 本身决定杀死另一个随机进程,我假设 JBoss 有某种机制通过杀死它认为不是的东西来释放内存不再需要或者它们都使用了(可能是错误的配置)资源,导致其中一个在无法获取资源时被杀死。
有没有人经历过类似的事情或知道那种机制?
很可能是 linux OOM Killer
。
您可以通过检查日志文件来验证其中一台服务器是否被它杀死:
grep -i kill /var/log/messages*
如果是这样,您应该会看到如下内容:
host kernel: Out of Memory: Killed process 2592
OOM killer 在确定要杀死哪个进程时使用以下算法:
The function select_bad_process() is responsible for choosing a process to kill. It decides by stepping through each running task and calculating how suitable it is for killing with the function badness(). The badness is calculated as follows, note that the square roots are integer approximations calculated with int_sqrt();
badness_for_task = total_vm_for_task / (sqrt(cpu_time_in_seconds) *
sqrt(sqrt(cpu_time_in_minutes)))
This has been chosen to select a process that is using a large amount of memory but is not that long lived. Processes which have been running a long time are unlikely to be the cause of memory shortage so this calculation is likely to select a process that uses a lot of memory but has not been running long.
可以通过读取/proc
oom_score
文件,手动查看每个进程的badness
cat /proc/10292/oom_score