在 awk 中创建一个独特的数组：可以详细说明这个片段吗？

Question

@EdMorton，我可以用这种方式在 awk 中唯一化一个数组：

BEGIN {
    # create an array 
    # here, I create an array from a string, but other approaches are possible, too
    split("a b c d e a b", array)

    # unique it
    for (i=1; i in array; i++) {
        if ( !seen[array[i]]++ ) {
            unique[++j] = array[i]
        }
    }

    # print out the result
    for (i=1; i in unique; i++) {
        print unique[i]
    }
    # results in:
    # a
    # b
    # c
    # d
    # e
}

不过，我不明白的是这个 ( !seen[array[i]]++ ) 条件有一个增量：

我知道我们在 seen 数组中收集唯一索引；
因此，我们检查我们的临时数组 seen 是否已经有一个索引 array[i]（如果没有，则将其添加到 unique）；
但是索引之后的增量是我仍然无法得到的:)（尽管）。

所以，我的问题如下：我们能否以更详尽的方式重写这个条件？可能这真的有助于完成我对它的看法:)

Answer 1

希望这更清楚，但 idk - 我能说的最好的是它根据要求更详细！

$ cat tst.awk
BEGIN {
    # create an array
    # here, I create an array from a string, but other approaches are possible, too
    split("a b c d e a b", array)

    # unique it
    for (i=1; i in array; i++) {
        val = array[i]
        count[val] = count[val] + 1

        if ( count[val] == 1 ) {
            is_first_time_val_seen = 1
        }
        else {
            is_first_time_val_seen = 0
        }

        if ( is_first_time_val_seen ) {
            unique[++j] = val
        }
    }

    # print out the result
    for (i=1; i in unique; i++) {
        print unique[i]
    }
}

$ awk -f tst.awk
a
b
c
d
e

Answer 2

另一种方法是将 array 的 values 放入新的 associative 数组中作为 keys 。这将强制执行唯一性：

BEGIN {
  # it's helpful to use the return value from `split`
  n = split("a b c d e a b", array)

  # use the element value as a key.
  # It doesn't really matter what the right-hand side of the assignment is.
  for (i = 1; i <= n; i++) uniq[array[i]] = i

  # now, it's easy to iterate over the unique keys
  for (elem in uniq) print elem
}

无保证顺序的输出：

a
b
c
d
e

如果您使用的是 GNU awk，请使用 PROCINFO["sorted_in"] 来控制数组遍历的排序

在 awk 中创建一个独特的数组：可以详细说明这个片段吗？

Creating a unique array in awk: can this snippet be elaborated?

arrays

awk