Bash

Question

我有一个制表符分隔的文件...

123 1:2334523   yes
127 1:332443    yes
113 1:332443    no
115 1:55434     no
115 1:55434     no
115 1:55434     yes

我想计算第 2 列中的值出现在第 2 列中的次数，然后将其打印到行尾，例如...

123 1:2334523   yes 1
127 1:332443    yes 2
113 1:332443    no  2
115 1:55434     no  3
115 1:55434     no  3   
115 1:55434     yes 3

因此在第 2 列中 1:332443 出现了两次，1:55434 出现了 3 次。

我认为这在 Awk 或 sed 中应该相对容易，但还没弄清楚。

Answer 1

你可以这样做：

awk 'NR == FNR { ++ctr[]; next } { print [=10=] "\t" ctr[]; }' filename filename

因为我们需要在打印前知道计数器，所以我们需要遍历文件两次，这就是 filename 被提到两次的原因。那么 awk 代码是：

NR == FNR {    # if the record number is the same as the record number in the
               # current file (that is: in the first pass)
  ++ctr[]    # count how often field 2 showed up
  next         # don't do anything else for the first pass
}
{              # then in the second pass:
  print [=11=] "\t" ctr[];   # print the line, a tab, and the counter.
}

Answer 2

这是只读取文件一次的awk：

awk '{a[NR]=[=10=];b[]++;c[NR]=} END {for (i=1;i<=NR;i++) print a[i]"\t"b[c[i]]}' file
123 1:2334523   yes     1
127 1:332443    yes     2
113 1:332443    no      2
115 1:55434     no      3
115 1:55434     no      3
115 1:55434     yes     3

这会将所有数据栈存储到一个数组a。统计数组b中的字段数2，然后存储索引栈数组c.
最后，打印出数组。

Bash - 如何打印列中的值出现在行尾的次数

Bash - how to print the number of times a value in a column occurs at the end of the row

awk

sed