PROCINFO 如何显示特定记录上 FS 的信息?

How does PROCINFO show info on FS on the specific record?

我正在阅读 GNU Awk User's Guide → 7.5.2 Built-in Variables That Convey InformationPROCINFO 内置变量的定义:

PROCINFO #

The elements of this array provide access to information about the running awk program. The following elements (listed alphabetically) are guaranteed to be available:

PROCINFO["FS"]

This is "FS" if field splitting with FS is in effect, "FIELDWIDTHS" if field splitting with FIELDWIDTHS is in effect, "FPAT" if field matching with FPAT is in effect, or "API" if field splitting is controlled by an API input parser.

是的,效果很好。请参阅此示例,当我提供字符串“hello;you”时,我按顺序将 FS 设置为“;”,将 FIELDWIDTHS 设置为“2 2”,将 FPAT 设置为三个字符:

$ gawk 'BEGIN{FS=";"}{print PROCINFO["FS"]; print }' <<< "hello;you"
FS
hello
$ gawk 'BEGIN{FIELDWIDTHS="2 2 2"}{print PROCINFO["FS"]; print }' <<< "hello;you"
FIELDWIDTHS
he
$ gawk 'BEGIN{FPAT="..."}{print PROCINFO["FS"]; print }' <<< "hello;you"
FPAT
hel

这很好,效果很好。

他们在 4.8 Checking How gawk Is Splitting Records 中提到的前一点:

In order to tell which kind of field splitting is in effect, use PROCINFO["FS"] (see section Built-in Variables That Convey Information). The value is "FS" if regular field splitting is being used, "FIELDWIDTHS" if fixed-width field splitting is being used, or "FPAT" if content-based field splitting is being used.

并且在 Changing FS Does Not Affect the Fields 中,他们描述了更改如何影响 下一条记录:

According to the POSIX standard, awk is supposed to behave as if each record is split into fields at the time it is read. In particular, this means that if you change the value of FS after a record is read, the values of the fields (i.e., how they were split) should reflect the old value of FS, not the new one.

这个案例很好的解释了:

$ gawk 'BEGIN{FS=";"} {FS="|"; print }' <<< "hello;you
bye|everyone"
hello  # "hello;you" is splitted using FS=";", the assignment FS="|" doesn't affect it yet
bye    # "bye|everyone" is splitted using FS="|"

考虑到所有这些因素,我假设 PROCINFO["FS"] 将始终将 "FS" 反映为正在打印的记录中的字段拆分。
然而,看到这个案例:

$ gawk 'BEGIN{FPAT="..."}{FS=";"; print PROCINFO["FS"]; print }' <<< "hello;you"
FS
hel

PROCINFO["FS"] 显示当前记录 (FS) 中设置的信息,而不是 Awk 在处理数据时考虑的信息(即 FPAT)。如果我们交换分配也会发生同样的情况:

$ gawk 'BEGIN{FS=";"}{FPAT="..."; print PROCINFO["FS"]; print }' <<< "hello;you"
FPAT
hello

为什么 PROCINFO["FS"] 显示的 FS 与打印它的记录中使用的 FS 不同?

字段拆分(使用 FS、FIELDWIDTHS 或 FPAT)发生在读取记录或 [=12=] 作为一个整体被赋予新值时(例如 [=13=]="foo"sub(/foo/,"bar") ). print PROCINFO["FS"] 告诉您 PROCINFO["FS"] 当前具有的值 不一定与上次发生字段拆分时的值相同。

有:

$ gawk 'BEGIN{FPAT="..."}{FS=";"; print PROCINFO["FS"]; print }' <<< "hello;you"
FS
hel

你正在设置 FS=";" </code> 已经基于 <code>FPAT="..." 填充之后,然后打印 PROCINFO["FS"] 新的值(将在 下一次 将记录拆分为字段时使用),然后打印在设置 FS=";" 之前填充的 </code> 的值。</p> <p>如果您将 <code>[=12=] 设置为自身,字段拆分将再次发生,这次使用新的 FS 值而不是原始 FPAT 值:

$ gawk 'BEGIN{FPAT="..."}{FS=";"; print PROCINFO["FS"]; print ; [=11=]=[=11=]; print }' <<< "hello;you"
FS
hel
hello