Bash 中有效标识符（例如函数、变量等）的规则是什么？

Question

Bash中标识符的语法规则是什么，尤其是函数名和变量名？

我编写了一个 Bash 脚本，并在 Ubuntu、Debian、Red Hat 5 和 6，甚至是旧的 Solaris 8 机器上的各种 Bash 版本上对其进行了测试。剧本运行不错，就这样发货了

然而，当用户在 SUSE 机器上尝试时，它给出了 "not a valid identifier" 错误。幸运的是，我对函数名中存在无效字符的猜测是正确的。连字符搞砸了。

一个至少经过一定程度测试的脚本在另一个 Bash 或发行版上会有完全不同的行为这一事实令人不安。我怎样才能避免这种情况？

Answer 1

来自manual：

   Shell Function Definitions
       ...
       name () compound-command [redirection]
       function name [()] compound-command [redirection]

name在别处定义：

       name   A  word  consisting  only  of alphanumeric characters and under‐
              scores, and beginning with an alphabetic character or an  under‐
              score.  Also referred to as an identifier.

所以连字符无效。然而，在我的系统上，它们确实有效...

$ bash --version
GNU bash, version 4.2.25(1)-release (x86_64-pc-linux-gnu)

Answer 2

来自3.3 Shell Functions：

Shell functions are a way to group commands for later execution using a single name for the group. They are executed just like a "regular" command. When the name of a shell function is used as a simple command name, the list of commands associated with that function name is executed. Shell functions are executed in the current shell context; no new process is created to interpret them.

Functions are declared using this syntax:
name () compound-command [ redirections ]
or
function name [()] compound-command [ redirections ]

来自2 Definitions：

name

A word consisting solely of letters, numbers, and underscores, and beginning with a letter or underscore. Names are used as shell variable and function names. Also referred to as an identifier.

Answer 3

命令标识符和变量名有不同的语法。变量名称仅限于字母数字字符和下划线，不能以数字开头。另一方面，命令名称可以是任何不包含 bash 元字符的名称（即便如此，它们也可以被引用）。

在bash中，函数名可以是命令名，只要它们被解析为不带引号的WORD。（除此之外，出于某种原因，它们不能是整数。）但是，这是一个 bash 扩展。如果目标机器正在使用其他一些 shell（例如破折号），它可能无法工作，因为 Posix 标准 shell 语法只允许在函数定义形式中使用 "NAME" （并且还禁止使用保留字）。

Answer 4

问题是关于 "the rules" 的，已经用两种不同的方式回答了，每种方式在某种意义上都是正确的，具体取决于您要称呼 "the rules" 的内容。只是为了充实@rici 的观点，即您可以推入函数名称中的任何字符，我写了一个小的 bash 脚本来尝试 check 一切可能的 (0-255)作为函数名的字符，以及函数名的第二个字符：

#!/bin/bash
ASCII=( nul soh stx etx eot enq ack bel bs tab nl vt np cr so si dle \
            dc1 dc2 dc3 dc4 nak syn etb can em sub esc fs gs rs us sp )

for((i=33; i < 127; ++i)); do
    printf -v Hex "%x" $i

    printf -v Chr "\x$Hex"
    ASCII[$i]="$Chr"
done
ASCII[127]=del
for((i=128; i < 256; ++i)); do
    ASCII[$i]=$(printf "0X%x" $i)
done

# ASCII table is now defined

function Test(){
    Illegal=""
    for((i=1; i <= 255; ++i)); do
        Name="$(printf \$(printf '%03o' $i))"
        eval "function $Name(){ return 0; }; $Name ;" 2>/dev/null
        if [[ $? -ne 0 ]]; then
            Illegal+=" ${ASCII[$i]}"
            #        echo Illegal: "${ASCII[$i]}"
        fi
    done
    printf "Illegal: %s\n" "$Illegal"
}
echo "$BASH_VERSION"
Test
Test "x"

# can we really do funky crap like this?
function [}{(){
   echo "Let me take you to, funkytown!"
}
[}{    # why yes, we can!
# though editor auto-indent modes may punish us

我实际上跳过了 NUL (0x00)，因为这是 bash 可能反对在输入流中查找的一个字符。此脚本的输出为：

4.4.0(1)-release
Illegal:  soh tab nl sp ! " # $ % & ' ( ) * 0 1 2 3 4 5 6 7 8 9 ; < > \ ` { | } ~ del
Illegal:  soh " $ & ' ( ) ; < > [ \ ` | del
Let me take you to, funkytown!

请注意，bash 高兴地让我将我的函数命名为“[}{”。可能我的代码不够严格，无法提供实际合法性的确切规则，但它应该说明可能的滥用方式。我希望我能标记这个答案"For mature audiences only."

Answer 5

此脚本测试所有有效字符具有 1 个字符的函数名称。

它输出 53 个有效字符（a-zA-Z 和下划线）使用
a POSIX shell 和 220 有效字符 BASH v4.4.12.

Ron Burk 的回答有效，但缺少数字。

#!/bin/sh

FILE='/tmp/FOO'
I=0
VALID=0

while [ $I -lt 256 ]; do {
        NAME="$( printf \$( printf '%03o' $I ))"
        I=$(( I + 1 ))

        >"$FILE"
        ( eval "$NAME(){ rm $FILE;}; $NAME" 2>/dev/null )

        if [ -f "$FILE" ]; then
                rm "$FILE"
        else
                VALID=$(( VALID + 1 ))
                echo "$VALID/256 - OK: $NAME"   
        fi
} done

Answer 6

注意这里最大的修正是换行符从不允许出现在函数名中。

我的回答：

Bash--posix: [a-zA-Z_][0-9a-zA-Z_]*
Bash 3.0-4.4: [^#%0-9[=12=] "$&'();<>\`|\x7f][^[=12=] "$&'();<>\`|\x7f]*
Bash 5.0: [^#%0-9[=13=] "$&'();<>\`|][^[=13=] "$&'();<>\`|]*
- </code> 和 <code>\x7f 现在有效
Bash 5.1: [^#%[=16=] "$&'();<>\`|][^[=16=] "$&'();<>\`|]*
- 数字可以先来？！是的！
任何 bash 3-5: [^#%0-9[=12=] "$&'();<>\`|\x7f][^[=12=] "$&'();<>\`|\x7f]*
- 与 3.0-4.4 相同
我的建议（意见）：[^#%0-9[=18=]-\f "$&'();<>\`|\x7f-\xff][^[=18=]-\f "$&'();<>\`|\x7f-\xff]
- 正版本：[!*+,-./:=?@A-Z\[\]^_a-z{}~][#%0-9!*+,-./:=?@A-Z\[\]^_a-z{}~]*

我的测试版本：

for ((x=1; x<256; x++)); do
  hex="$(printf "%02x" $x)"
  name="$(printf \x${hex})"
  if [ "${x}" = "10" ]; then
    name=$'\n'
  fi
  if [ "$(echo -n "${name}" | xxd | awk '{print }')" != "${hex}" ]; then
    echo "$x failed first sanity check"
  fi
  (
    eval "function ${name}(){ echo ${x};}" &>/dev/null
    if test "$("${name}" 2>/dev/null)" != "${x}"; then
      eval "function ok${name}doe(){ echo ${x};}" &>/dev/null
      if test "$(type -t okdoe 2>/dev/null)" = "function"; then
        echo "${x} failed second sanity test"
      fi
      if test "$("ok${name}doe" 2>/dev/null)" != "${x}"; then
        echo "${x}(${name}) never works"
      else
        echo "${x}(${name}) cannot be first"
      fi
    else
      # Just assume everything over 128 is hard, unless this says otherwise
      if test "${x}" -gt 127; then
        if declare -pF | grep -q "declare -f \x${hex}"; then
          echo "${x} works, but is actually not difficult"
          declare -pF | grep "declare -f \x${hex}" | xxd
        fi
      elif ! declare -pF | grep -q "declare -f \x${hex}"; then
        echo "${x} works, but is difficult in bash"
      fi
    fi
  )
done

一些补充说明：

字符 1-31 不太理想，因为它们更难打字。
字符 128-255 在 bash 中更不理想（macOS 上的 bash 3.2 除外。可能编译方式不同？）因为像 declare -pF 这样的命令不会渲染特殊字符，即使它们存在于内存中。这意味着任何内省代码都会错误地假设这些函数不存在。但是，compgen 等功能仍然可以正确呈现字符。
超出了我的测试范围，但一些 unicode 也可以工作，尽管在 macOS 上通过 ssh paste/type 非常困难。

Bash 中有效标识符（例如函数、变量等）的规则是什么？

What are the rules for valid identifiers (e.g. functions, vars, etc) in Bash?

unix

linux

syntax

bash