awk数组下标转换规则是什么?

What is the rule of converting subscript of awk array?

我知道,awk数组的下标必须是字符串。

[root@localhost]# awk 'END {array[A0]="empty"; print array[""]}'
empty

所以在上面的命令行中,因为 A0 没有被引用为 "A0" ,它代表一个变量。因为变量A0之前没有被赋值,所以值为""。所以 print array[""] 输出 empty.

但是在下面的命令中:

[root@localhost]#  awk 'END {array[0]="empty"; print array[""], array["0"]}'
 empty

array[""]的值为NULL,而array["0"]的值为"empty"。按照我的理解,那是因为变量不能以数字开头,array[0]默认转换为array["0"]。这样对吗? awk数组下标转换规则是什么?

awk 中的数组下标是字符串,因此当您使用表达式作为数组下标时,它会被转换为字符串(如果它还不是字符串的话)。 0 是一个数字,而不是变量,因此适用以下内容(来自 POSIX):

A numeric value that is exactly equal to the value of an integer (see Concepts Derived from the ISO C Standard) shall be converted to a string by the equivalent of a call to the sprintf function (see String Functions) with the string "%d" as the fmt argument and the numeric value being converted as the first and only expr argument. Any other numeric value shall be converted to a string by the equivalent of a call to the sprintf function with the value of the variable CONVFMT as the fmt argument and the numeric value being converted as the first and only expr argument. The result of the conversion is unspecified if the value of CONVFMT is not a floating-point format specification.

0 是一个整数,所以它在转换为字符串时给出 "0",而不是 ""。这是因为在 C 代码中,sprintf(buf, "%d", 0)buf 之后将包含字符串 "0".

至于变量名:在awk grammar中,变量由记号NAME描述。它的字典编排规则如下:

9) A sequence of underscores, digits, and alphabetics from the portable character set (see the Base Definitions volume of IEEE Std 1003.1-2001, Section 6.1, Portable Character Set), beginning with an underscore or alphabetic, shall be considered a word.

12) The token NAME shall consist of a word that is not a keyword or a name of a built-in function and is not followed immediately (without any delimiters) by the '(' character.

此描述后面的标记是变量,最初为空,当空变量转换为字符串时,它会生成空字符串。

也就是说:

  • 0是一个数字
  • a是变量名
  • _是变量名
  • a0是变量名
  • _0是变量名
  • 0a 被解析为 0 a0 和变量 a 的串联)