使用 VS2019 远程(gdb)调试时远程机器间歇性连接失败

Intermittent connection failure to remote machine when remote (gdb) debugging with VS2019

我正在 Linux 虚拟机上编译和调试本机 C++ 代码,该虚拟机托管在我 运行 Visual Studio 2019 年的同一台机器上(使用 Hyper-V)(企业版 16.11.1)。远程连接仅在 大部分时间 有效。当我尝试启动构建或调试会话时,大约有 15%-20% 的时间失败并显示:

"Could not connect to the remote system. Please verify your connection settings, and that your machine is on the network and reachable."

没有可靠的时间段发生(或不发生)。我可以成功远程编译,然后尝试在两 (2) 秒后开始调试,但失败了。一旦失败,我去 Tools > Options > Cross Platform > Connection Manager > [highlight already-selected connection] > Verify 我得到一个对话框,指示“已验证连接。”​​,表明它实际上能够连接。

可以连续多次正常,然后突然失效。一旦失败,我必须关闭 Visual Studio 并重新打开它才能使它再次开始正常工作。通过实验,我发现我也可以更改与不同远程主机的连接,然后再次回到原来的连接,使其重新开始工作,但这比仅仅弹跳 VS2019 需要更长的时间。开发过程中每隔几分钟就得重启VS2019,这已经成为真正的PITA了。

是否有其他人遇到这种间歇性故障,and/or对导致故障的原因或如何解决它有任何想法(或者甚至比我当前的方法更快地解决)?

尝试启动调试会话失败后远程连接日志的尾部是:

07:13:06.4516823 [Info, Thread 82] liblinux.RemoteSystemBase: Connecting over SSH to 10.10.10.10:22
07:13:06.6101023 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "g++ -v" finished with exit code 0 after 46.0657ms
07:13:06.6127453 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "clang++ -v" finished with exit code 127 after 2.1315ms
07:13:06.6151614 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "gdbserver --version" finished with exit code 0 after 2.3438ms
07:13:06.6181017 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "gcc -v" finished with exit code 0 after 2.8722ms
07:13:06.6634892 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "gdb -v" finished with exit code 0 after 45.5621ms
07:13:06.7094904 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "/usr/bin/gdb -v" finished with exit code 0 after 45.7139ms
07:13:06.7114880 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "/usr/local/bin/gdb -v" finished with exit code 127 after 2.5014ms
07:13:06.7184905 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "rsync -v" finished with exit code 1 after 6.6996ms
07:13:06.7209159 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "lldb -v" finished with exit code 127 after 2.1041ms
07:13:06.7235831 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "ninja --version" finished with exit code 0 after 2.6598ms
07:13:06.7265292 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "cmake --version" finished with exit code 0 after 2.6648ms
07:13:06.7284878 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "make -v" finished with exit code 0 after 2.3541ms
07:13:06.7324881 [Info, Thread 82] liblinux.IO.RemoteFileSystemImpl: Connecting over SFTP to 10.10.10.10:22
07:13:06.8813322 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "cat /etc/os-release" finished with exit code 0 after 3.1399ms
07:13:06.8842647 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "uname -m" finished with exit code 0 after 2.6496ms
07:13:06.8867628 [Info, Thread 82] liblinux.Shell.CommonCommandBase: Command "uname -r" finished with exit code 0 after 2.3968ms
07:13:06.8872544 [Info, Thread 82] liblinux.RemoteSystemBase: Disconnecting over SSH from "10.10.10.10:22"
07:13:06.8872544 [Info, Thread 82] liblinux.IO.RemoteFileSystemImpl: Disconnecting over SFTP from 10.10.10.10:22

要查看 SSH 登录 real-time 登录远程机器,SSH 到它并且:

$ sudo journalctl -f -u ssh

(可选:您可以在 SSH 守护进程配置文件中设置日志级别(例如 DEBUG 或 INFO)。)在 Debian 上,SSH 守护进程配置文件位于:

/etc/ssh/sshd_config

您会发现 Visual Studio 打开了几个到远程机器的 SSH 会话,并在打开后几乎立即关闭了 大多数 会话。每次 remote-compile 或 remote-debug,您都会看到几个会话打开并快速关闭。然而,似乎有一个或多个仍然存在。它们最终超时并被远程 SSH 守护程序关闭,一条或多条记录的消息证明了这一点:

sshd[{*nix_process_id}]: Timeout, client not responding from user {user} {ip_address} port {random_port#} (where {*nix_process_id}, {user}, and {random_port#} are replaced by the obvious).

在 this/these 个会话超时后,Visual Studio 立即决定它无法再连接(尽管它可以)。这似乎是一个 Visual Studio 错误,但我在网上找不到任何相关信息。

我的解决方法是在 SSH 守护程序配置文件中设置以下内容:

MaxSessions 100
TCPKeepAlive yes
ClientAliveCountMax 3
ClientAliveInterval 180 

MaxSessions 默认为 10,这似乎是临界值,因为 Visual Studio 似乎一次使用 half-dozen 或更多。将 ClientAliveInterval 设置为 180 秒会导致 ssh 守护进程每 180 秒向客户端发送一个空 ssh 数据包,而 ClientAliveCountMax 设置 ssh 守护进程将容忍 ssh 客户端在会话超时之前未能确认该空数据包的次数。

TCPKeepAlive 默认关闭。 TCPKeepAlive 做 more-or-less 与上述相同的事情,并且可能是多余的——它发送一个未加密的 TCP 数据包,只是为了确保客户端防火墙不会决定对话结束并关闭端口。

我不确定这些缓解措施中的哪一个对改善问题负有最大责任,并且仍未 100% 解决。 Visual Studio 仍然无法随机确认空数据包,尤其是在调试时,导致 Linux 主机上的 ssh 守护程序关闭 Visual Studio 似乎需要保持打开状态的会话。但是,它得到了一个数量级的改进 -- 现在失败的概率不到 2%。