为什么 GNU Awk 的 POSIX 模式在将 RS 设置为另一件事时不考虑换行字段?

How come the POSIX mode of GNU Awk does not consider a new line a field, when setting the RS to another thing?

我正在浏览 GNU Awk User's Guide and found this in the 4.1.1 Record Splitting with Standard awk 部分:

When using regular characters as the record separator, there is one unusual case that occurs when gawk is being fully POSIX-compliant (see section Command-Line Options). Then, the following (extreme) pipeline prints a surprising ‘1’:

$ echo | gawk --posix 'BEGIN { RS = "a" } ; { print NF }'
-| 1

There is one field, consisting of a newline. The value of the built-in variable NF is the number of fields in the current record. (In the normal case, gawk treats the newline as whitespace, printing ‘0’ as the result. Most other versions of awk also act this way.)

我检查了它,但它对我的 GNU Awk 5.0.0 不起作用:

$ gawk --version
GNU Awk 5.0.0, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.1.2)
$ echo | gawk --posix 'BEGIN { RS = "a" } ; { print NF }'
0

也就是说,行为与没有 POSIX 模式时完全相同:

$ echo | gawk 'BEGIN { RS = "a" } ; { print NF }'
0

我理解它的意思,其中当记录分隔符不是默认值时(即,它不是新行),仅新行的内容被视为一个字段。但是,我无法复制它。

我应该如何重现这个例子?我也试过 gawk --traditionalgawk -P 但结果总是 0.

因为我检查的 GNU Awk 用户指南是针对 5.1 版本的,而我有 5.0.0,所以我也检查了 an archived version for 5.0.0 并且它显示了相同的行,所以它是5.0 和 5.1 之间没有变化。

阅读POSIX标准时,我们发现:

The awk utility shall interpret each input record as a sequence of fields where, by default, a field is a string of non-<blank> non-<newline> characters. This default <blank> and <newline> field delimiter can be changed by using the FS built-in variable

If FS is <space>, skip leading and trailing <blank> and <newline> characters; fields shall be delimited by sets of one or more <blank> or <newline> characters.

source: POSIX awk standard: IEEE Std 1003.1-2017

话虽如此,正确的行为应该如下:

$ echo | awk 'BEGIN{RS="a"}{print NR,NF,length}'
1 0 1

定义FS时,情况完全不同:

$ echo | awk 'BEGIN{FS="b";RS="a"}{print NR,NF,length}'
1 1 1
$ echo | awk 'BEGIN{FS="\n";RS="a"}{print NR,NF,length}'
1 2 1

结论:我认为 GNU awk 文档是错误的。