根据两列排序并根据最后一列提取前两位

Question

我有一个包含三列的文件。我想为第 2 列中的每个唯一值提取第 3 列中具有前两个值的行。

cat file.list
run1/xx2/x2c1.txt 21 -190
run1/xx2/x2c2.txt 19 -180
run1/xx2/x2c3.txt 18 -179
run1/xx2/x2c4.txt 19 -162
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162
run2/xx2/x2c2.txt 18 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run2/xx2/x2c5.txt 21 -179
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c2.txt 19 -192
run3/xx2/x2c3.txt 21 -191
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c5.txt 19 -179

预期输出

run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

我觉得 sort、uniq 和 awk 的某种组合可能会完成，但我无法正确执行它。我可以按列排序

sort -nk2 -nk3 file.list

这给了我按 -k2 和 -k3 排序的输出，如下所示，

run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run1/xx2/x2c3.txt 18 -179
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run2/xx2/x2c4.txt 19 -184
run1/xx2/x2c2.txt 19 -180
run3/xx2/x2c5.txt 19 -179
run1/xx2/x2c4.txt 19 -162
run3/xx2/x2c1.txt 19 -162
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190
run2/xx2/x2c5.txt 21 -179
run1/xx2/x2c5.txt 21 -172
run2/xx2/x2c1.txt 21 -162

但后来我陷入了如何只提取最后一列中 18、19 和 20 两个得分最高的行的问题。

我非常感谢任何 bash 解决方案。

Answer 1

将当前 sort 结果传送到 awk:

$ sort -nk2 -nk3 file.list | awk 'a[]++ < 2'
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

其中：

字段 #2 (</code>) 用作数组 <code>a[]
如果数组中存储的值小于 2 则打印当前输入行
然后递增计数器 (++)
我们第一次看到 a[18] 计数为 0，我们打印该行，并将计数递增 1
我们第二次看到 a[18] 计数为 1，我们打印该行，并将计数递增 1
第 3 次（到第 n 次）我们看到 a[18] 计数大于或等于 2，我们 not 打印该行，并递增计数

我们首先增加计数的替代方案：

$ sort -nk2 -nk3 file.list | awk '++a[] <= 2'
run2/xx2/x2c2.txt 18 -192
run3/xx2/x2c4.txt 18 -184
run3/xx2/x2c2.txt 19 -192
run2/xx2/x2c3.txt 19 -191
run3/xx2/x2c3.txt 21 -191
run1/xx2/x2c1.txt 21 -190

根据两列排序并根据最后一列提取前两位

Sort according to two columns and extract top two based on last column

sorting

bash

awk

extract

uniq