通过 shebang 线路链接解释器是否可移植?

Is chaining interpreters via shebang lines portable?

通过所谓的 shebang 行将脚本绑定到特定的解释器是 POSIX 操作系统上的一种众所周知的做法。例如,如果执行以下脚本(给予足够的文件系统权限),操作系统将启动 /bin/sh 解释器,并将脚本的文件名作为第一个参数。随后,shell 将执行脚本中的命令,跳过将被视为注释的 shebang 行。

#! /bin/sh

date -R
echo hello world

可能的输出:

Sat, 01 Apr 2017 12:34:56 +0100
hello world

曾经相信解释器(本例中的/bin/sh必须是本地可执行文件,不能一个脚本本身,反过来又需要启动另一个解释器。

不过,我还是继续尝试了以下实验。

使用下面的哑巴shell保存为/tmp/interpreter.py,...

#! /usr/bin/python3

import sys
import subprocess

for script in sys.argv[1:]:
    with open(script) as istr:
        status = any(
            map(
                subprocess.call,
                map(
                    str.split,
                    filter(
                        lambda s : s and not s.startswith('#'),
                        map(str.strip, istr)
                    )
                )
            )
        )
        if status:
            sys.exit(status)

…而下面的脚本保存为/tmp/script.xyz,

#! /tmp/interpreter.py

date -R
echo hello world

…我能够(在使两个文件都可执行之后)执行 script.xyz.

5gon12eder:/tmp> ls -l
total 8
-rwxr-x--- 1 5gon12eder 5gon12eder 493 Jun 19 01:01 interpreter.py
-rwxr-x--- 1 5gon12eder 5gon12eder  70 Jun 19 01:02 script.xyz
5gon12eder:/tmp> ./script.xyz
Mon, 19 Jun 2017 01:07:19 +0200
hello world

这让我很吃惊。我什至可以通过另一个脚本启动 scrip.xyz

所以,我想问的是:

类 Unix 操作系统中的新可执行文件由系统调用 execve(2) 启动。 execve 的手册页包括:

Interpreter scripts
    An interpreter script is  a  text  file  that  has  execute
    permission enabled and whose first line is of the form:

       #! interpreter [optional-arg]

    The interpreter must be a valid pathname for an executable which
    is not itself a script.  If the filename argument  of  execve()
    specifies  an interpreter script, then interpreter will be invoked
    with the following arguments:

       interpreter [optional-arg] filename arg...

   where arg...  is the series of words pointed to by the argv
   argument of execve().

   For portable use, optional-arg should either be absent, or be
   specified as a single word (i.e., it should not contain white
   space);  see  NOTES below.

所以在这些限制(类 Unix,可选参数最多一个词)内,是的,shebang 脚本是可移植的。阅读手册页以获取更多详细信息,包括二进制可执行文件和脚本之间调用的其他差异。

  1. 请参阅下面的粗体文字:

    This mechanism allows scripts to be used in virtually any context normal compiled programs can be, including as full system programs, and even as interpreters of other scripts. As a caveat, though, some early versions of kernel support limited the length of the interpreter directive to roughly 32 characters (just 16 in its first implementation), would fail to split the interpreter name from any parameters in the directive, or had other quirks. Additionally, some modern systems allow the entire mechanism to be constrained or disabled for security purposes (for example, set-user-id support has been disabled for scripts on many systems). -- WP

  2. 并且 COLUMNS=75 man execve | grep -nA 23 " Interpreter scripts" | head -39Ubuntu 17.04 框上的输出, 特别是 #186-#189 行告诉我们什么在 Linux 上起作用(即脚本可以是解释器,最多四个级别深):

166:   Interpreter scripts
167-       An interpreter script is a text file that has  execute  permission
168-       enabled and whose first line is of the form:
169-
170-           #! interpreter [optional-arg]
171-
172-       The  interpreter  must be a valid pathname for an executable file.
173-       If the filename argument  of  execve()  specifies  an  interpreter
174-       script,  then interpreter will be invoked with the following argu‐
175-       ments:
176-
177-           interpreter [optional-arg] filename arg...
178-
179-       where arg...  is the series of words pointed to by the argv  argu‐
180-       ment of execve(), starting at argv[1].
181-
182-       For  portable  use,  optional-arg  should  either be absent, or be
183-       specified as a single word (i.e.,  it  should  not  contain  white
184-       space); see NOTES below.
185-
186-       Since Linux 2.6.28, the kernel permits the interpreter of a script
187-       to itself be a script.  This permission  is  recursive,  up  to  a
188-       limit  of four recursions, so that the interpreter may be a script
189-       which is interpreted by a script, and so on.
--
343:   Interpreter scripts
344-       A  maximum  line length of 127 characters is allowed for the first
345-       line in an interpreter scripts.
346-
347-       The semantics of  the  optional-arg  argument  of  an  interpreter
348-       script  vary  across implementations.  On Linux, the entire string
349-       following the interpreter name is passed as a single  argument  to
350-       the  interpreter,  and  this string can include white space.  How‐
351-       ever, behavior differs on some other systems.   Some  systems  use
352-       the first white space to terminate optional-arg.  On some systems,
353-       an interpreter script can have multiple arguments, and white  spa‐
354-       ces in optional-arg are used to delimit the arguments.
355-
356-       Linux ignores the set-user-ID and set-group-ID bits on scripts.

来自 Solaris 11 exec(2) 手册页:

 An interpreter file begins with a line of the form

   #! pathname [arg]

 where pathname is the path of the interpreter, and arg is an
 optional argument. When an interpreter file is executed, the
 system  invokes  the  specified  interpreter.  The  pathname
 specified  in  the interpreter file is passed as arg0 to the
 interpreter. If arg was specified in the  interpreter  file,
 it  is  passed  as  arg1  to  the interpreter. The remaining
 arguments to the interpreter are arg0 through  argn  of  the
 originally  exec'd  file.  The interpreter named by pathname
 must not be an interpreter file.

正如最后一条声明所述,Solaris 中根本不支持链接解释器,尝试这样做将导致最后一个未解释的解释器(例如 /usr/bin/python3)解释第一个脚本(例如 /tmp/script.xyz,最后的命令行将变成 /usr/bin/python3 /tmp/script.xyz),没有链接。

所以脚本解释器链接根本不可移植。