难解难解git+ssh+proxy失败"bash: No such file or directory"

The hard way to debug the mysterious git+ssh+proxy failure "bash: No such file or directory"

我正在尝试通过 SOCKS5 代理克隆 github 存储库。在 ~/.ssh/config 我有:

Host github.com *.github.com
    ProxyCommand /usr/bin/nc -X 5 -x 127.0.0.1:7070 %h %p

"git 克隆" 失败并出现错误 bash: No such file or directory:

$ git clone git@github.com:aureliojargas/sedsed.git
Cloning into 'sedsed'...
bash: No such file or directory
kex_exchange_identification: Connection closed by remote host
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

我手动尝试了 ssh 命令,它也失败了:

$ ssh -v git@github.com
OpenSSH_8.1p1, LibreSSL 2.7.3
debug1: Reading configuration data /Users/pynexj/.ssh/config
debug1: /Users/pynexj/.ssh/config line 16: Applying options for github.com
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 47: Applying options for *
debug1: Executing proxy command: exec /usr/bin/nc -X 5 -x 127.0.0.1:7070 github.com 22
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
debug1: identity file /Users/pynexj/.ssh/id_rsa type 0
debug1: identity file /Users/pynexj/.ssh/id_rsa-cert type -1
debug1: identity file /Users/pynexj/.ssh/id_dsa type -1
debug1: identity file /Users/pynexj/.ssh/id_dsa-cert type -1
debug1: identity file /Users/pynexj/.ssh/id_ecdsa type -1
debug1: identity file /Users/pynexj/.ssh/id_ecdsa-cert type -1
bash: No such file or directory
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
debug1: identity file /Users/pynexj/.ssh/id_ed25519 type -1
debug1: identity file /Users/pynexj/.ssh/id_ed25519-cert type -1
debug1: identity file /Users/pynexj/.ssh/id_xmss type -1
debug1: identity file /Users/pynexj/.ssh/id_xmss-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_8.1
kex_exchange_identification: Connection closed by remote host

然后我手动尝试了 nc 命令,它确实有效:

$ /usr/bin/nc -X 5 -x 127.0.0.1:7070 github.com 22
SSH-2.0-babeld-8cd15329
^C

而且 SOCKS5 代理也工作正常:

$ curl -x socks5://127.0.0.1:7070/ https://github.com/ > foo.html
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  214k    0  214k    0     0  86775      0 --:--:--  0:00:02 --:--:-- 86775

我很好奇是谁(以及为什么)产生错误 bash: no such file or directory

对我来说,这个问题是 macOS 特有的。我在 Google 上搜索了很多,在 macOS 10.15 (Catalina) 上发现了许多损坏的 SSH 案例,但 none 的解决方法对我有用。最终我不得不看一下 OpenSSH 代码并发现了问题。


在源文件中 sshconnect.c:

 194 static int
 195 ssh_proxy_connect(struct ssh *ssh, const char *host, const char *host_arg,
 196     u_short port, const char *proxy_command)
 197 {
 ...
 ...
 201     char *shell;
 202
 203     if ((shell = getenv("SHELL")) == NULL || *shell == '[=10=]')
 204         shell = _PATH_BSHELL;
 ...
 ...
 211     command_string = expand_proxy_command(proxy_command, options.user,
 212         host, host_arg, port);
 213     debug("Executing proxy command: %.500s", command_string);
 214
 215     /* Fork and execute the proxy command. */
 216     if ((pid = fork()) == 0) {
 217         char *argv[10];
 ...
 ...
 240         argv[0] = shell;
 241         argv[1] = "-c";
 242         argv[2] = command_string;
 243         argv[3] = NULL;
 244
 245         /* Execute the proxy command.  Note that we gave up any
 246            extra privileges above. */
 247         ssh_signal(SIGPIPE, SIG_DFL);
 248         execv(argv[0], argv);
 249         perror(argv[0]);
 250         exit(1);
 251     }

参见第 203、240 和 248 行,ssh 正在尝试 运行 ProxyCommand$SHELL (我没有找到文档为此) 并且它使用 execv() 不会在 $PATH 中搜索。然后我检查了我的 $SHELL:

$ echo $SHELL
bash

这就是问题所在。 $SHELL 不是完整路径名可执行文件,因此 execv() 无法执行它,错误 bash: No such file or directory 来自第 249 行的 perror()(错误让我困惑很多。前缀 bash: 让我认为错误来自 Bash。)

解决方案: 手动将 SHELL 设置为 shell 的完整路径名,例如/bin/bash(我没有在.screenrc里写shell /bin/bash因为我还有/usr/local/bin/bash。)


SHELL=bash是谁定的?为什么不设置 SHELL=/bin/bash?

在我的 ~/.screenrc 我有:

shell bash

根据屏幕manual

  • shell command

    Set the command to be used to create a new shell. This overrides the value of the environment variable $SHELL.

SHELL 变量最初是 /bin/bash 在我启动屏幕之前在我的交互式 shell 中,所以设置 SHELL=bash 的是屏幕。我认为屏幕应该找出 shell 的完整路径名并将 SHELL 设置为完整路径名,因为根据 posix:

This variable shall represent a pathname of the user's preferred command language interpreter.


那为什么它在我的 Linux 系统 (Debian) 上也能正常工作,而我也有 SHELL=bash(也在屏幕上)?

我做了一个 strace 并得到了这个:

$ SHELL=xxx strace -f ssh git@github.com
[...]
[pid  5767] rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
[pid  5767] execve("/root/bin/xxx", ["xxx", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x561e33a599a0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid  5767] execve("/usr/local/bin/xxx", ["xxx", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x561e33a599a0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid  5767] execve("/usr/local/sbin/xxx", ["xxx", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x561e33a599a0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid  5767] execve("/usr/sbin/xxx", ["xxx", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x561e33a599a0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid  5767] execve("/usr/bin/xxx", ["xxx", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x561e33a599a0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid  5767] execve("/sbin/xxx", ["xxx", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x561e33a599a0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid  5767] execve("/bin/xxx", ["xxx", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x561e33a599a0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid  5767] dup(2)                      = 3
[pid  5767] fcntl(3, F_GETFL)           = 0x8402 (flags O_RDWR|O_APPEND|O_LARGEFILE)
[pid  5767] fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x21), ...}) = 0
[pid  5767] write(3, "xxx: No such file or directory\n", 31xxx: No such file or directory
) = 31
[pid  5767] close(3)                    = 0
[...]

我们可以看到,它实际上是在$PATH中搜索xxx。为什么?我想 Debian 一定已经修补了 openssh 并改变了它的行为。 (如果我了解 Debian 内部构建,我会验证这一点。:-)


更新 2020-11-19

我从 source 手动编译了 OpenSSH (v8.4) 并在 Debian 上重现了同样的问题。这证实 Debian 已经修补了 OpenSSH 并改变了它的行为。

$ /usr/local/openssh-8.4/bin/ssh git@github.com
bash: No such file or directory
kex_exchange_identification: Connection closed by remote host
$ strace -f /usr/local/openssh-8.4/bin/ssh git@github.com
[...]
[pid 21020] rt_sigaction(SIGPIPE, {sa_handler=SIG_DFL, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f19a05a9840}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
[pid 21020] execve("bash", ["bash", "-c", "exec nc -X 5 -x 127.0.0.1:7070 g"...], 0x5566982872f0 /* 33 vars */) = -1 ENOENT (No such file or directory)
[pid 21020] dup(2)                      = 3
[pid 21020] fcntl(3, F_GETFL)           = 0x8402 (flags O_RDWR|O_APPEND|O_LARGEFILE)
[pid 21020] fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x25), ...}) = 0
[pid 21020] write(3, "bash: No such file or directory\n", 32bash: No such file or directory
) = 32
[pid 21020] close(3)
[...]