程序完成后 mpirun 挂起

mpirun hangs after program completion

当我 运行 以下命令时,我得到了预期的输出,但程序没有立即终止。

$ mpirun -np 2 echo 1
1
1

程序也不响应中断。大约一分钟后,我回到了 shell.

或换句话说:程序 mpirun -np 2 echo 1; echo 'done' 运行 成功但需要很长时间。

更新: 我运行strace mpirun -np 2 echo 1

程序挂在这里:

sysinfo({uptime=5064793, loads=[153856, 184128, 229600], totalram=67362279424, freeram=26006364160, sharedram=8040448, bufferram=1739857920, totalswap=34359734272, freeswap=34358018048, procs=309, totalhigh=0, freehigh=0, mem_unit=1}) = 0
uname({sysname="Linux", nodename="euler", ...}) = 0
ioctl(13, _IOC(0, 0, 0x25, 0)

然后在这里:

openat(AT_FDCWD, "/tmp/openmpi-sessions-216211@euler_0/42701", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
munmap(0x7f61ed88c000, 2127408)         = 0
munmap(0x7f61ee0a1000, 2101720)         = 0
close(9)                                = 0
munmap(0x7f61ede9e000, 2105664)         = 0
munmap(0x7f61ed685000, 2122480)         = 0
munmap(0x7f61eda95000, 2109856)         = 0
munmap(0x7f61ed47c000, 2130304)         = 0
munmap(0x7f61ed05b000, 2109896)         = 0
munmap(0x7f61ecc9a000, 3934648)         = 0
munmap(0x7f61ed25f000, 2212016)         = 0
munmap(0x7f61ec8e3000, 3894144)         = 0
munmap(0x7f61ec6bd000, 2248968)         = 0
munmap(0x7f61ea776000, 28999696)        = 0
munmap(0x7f61edc99000, 2110072)         = 0
exit_group(0)                           = ?

你能帮我进一步调试吗?

显然,NVIDIA 驱动程序已损坏。将驱动程序更新至 440.64.00 解决了该问题。