程序完成后 mpirun 挂起
mpirun hangs after program completion
当我 运行 以下命令时,我得到了预期的输出,但程序没有立即终止。
$ mpirun -np 2 echo 1
1
1
程序也不响应中断。大约一分钟后,我回到了 shell.
或换句话说:程序 mpirun -np 2 echo 1; echo 'done'
运行 成功但需要很长时间。
更新:
我运行strace mpirun -np 2 echo 1
程序挂在这里:
sysinfo({uptime=5064793, loads=[153856, 184128, 229600], totalram=67362279424, freeram=26006364160, sharedram=8040448, bufferram=1739857920, totalswap=34359734272, freeswap=34358018048, procs=309, totalhigh=0, freehigh=0, mem_unit=1}) = 0
uname({sysname="Linux", nodename="euler", ...}) = 0
ioctl(13, _IOC(0, 0, 0x25, 0)
然后在这里:
openat(AT_FDCWD, "/tmp/openmpi-sessions-216211@euler_0/42701", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
munmap(0x7f61ed88c000, 2127408) = 0
munmap(0x7f61ee0a1000, 2101720) = 0
close(9) = 0
munmap(0x7f61ede9e000, 2105664) = 0
munmap(0x7f61ed685000, 2122480) = 0
munmap(0x7f61eda95000, 2109856) = 0
munmap(0x7f61ed47c000, 2130304) = 0
munmap(0x7f61ed05b000, 2109896) = 0
munmap(0x7f61ecc9a000, 3934648) = 0
munmap(0x7f61ed25f000, 2212016) = 0
munmap(0x7f61ec8e3000, 3894144) = 0
munmap(0x7f61ec6bd000, 2248968) = 0
munmap(0x7f61ea776000, 28999696) = 0
munmap(0x7f61edc99000, 2110072) = 0
exit_group(0) = ?
你能帮我进一步调试吗?
显然,NVIDIA 驱动程序已损坏。将驱动程序更新至 440.64.00
解决了该问题。
当我 运行 以下命令时,我得到了预期的输出,但程序没有立即终止。
$ mpirun -np 2 echo 1
1
1
程序也不响应中断。大约一分钟后,我回到了 shell.
或换句话说:程序 mpirun -np 2 echo 1; echo 'done'
运行 成功但需要很长时间。
更新:
我运行strace mpirun -np 2 echo 1
程序挂在这里:
sysinfo({uptime=5064793, loads=[153856, 184128, 229600], totalram=67362279424, freeram=26006364160, sharedram=8040448, bufferram=1739857920, totalswap=34359734272, freeswap=34358018048, procs=309, totalhigh=0, freehigh=0, mem_unit=1}) = 0
uname({sysname="Linux", nodename="euler", ...}) = 0
ioctl(13, _IOC(0, 0, 0x25, 0)
然后在这里:
openat(AT_FDCWD, "/tmp/openmpi-sessions-216211@euler_0/42701", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
munmap(0x7f61ed88c000, 2127408) = 0
munmap(0x7f61ee0a1000, 2101720) = 0
close(9) = 0
munmap(0x7f61ede9e000, 2105664) = 0
munmap(0x7f61ed685000, 2122480) = 0
munmap(0x7f61eda95000, 2109856) = 0
munmap(0x7f61ed47c000, 2130304) = 0
munmap(0x7f61ed05b000, 2109896) = 0
munmap(0x7f61ecc9a000, 3934648) = 0
munmap(0x7f61ed25f000, 2212016) = 0
munmap(0x7f61ec8e3000, 3894144) = 0
munmap(0x7f61ec6bd000, 2248968) = 0
munmap(0x7f61ea776000, 28999696) = 0
munmap(0x7f61edc99000, 2110072) = 0
exit_group(0) = ?
你能帮我进一步调试吗?
显然,NVIDIA 驱动程序已损坏。将驱动程序更新至 440.64.00
解决了该问题。