在特定列子集中匹配后删除文件中的重复行

Delete duplicate lines in a file after a match in a particular subset of columns

我有一个未排序的文件,其中包含分布在许多列中的行数据,如本例所示:

10:23:55.521803 [INFO] eceb [ 41] 235 870 1 26601 349 910
10:24:11.771454 [INFO] eceb [ 41] 41 870 0 26601 349 910
10:25:18.858675 [INFO] eceb [ 41] 235 870 3 26601 349 910
10:25:18.814763 [INFO] eceb [ 41] 60 1247 0 38490 163 715
10:25:19.844738 [INFO] eceb [ 41] 60 1248 0 38490 163 715
10:24:11.771454 [INFO] eceb [ 41] 41 870 0 26641 389 920

我想找出所有相同的行,仅考虑第 4,5 和 6 列,并从文件中删除所有这些行。

因此,在这个例子中,结果应该是:

10:25:18.814763 [INFO] eceb [ 41] 60 1247 0 38490 163 715
10:25:19.844738 [INFO] eceb [ 41] 60 1248 0 38490 163 715

我该怎么做?

计划

  • read the file and construct an key mapping to count occurrences of key fields
  • reread file only printing the records with occurences equal to one

filter.awk

#!/usr/bin/awk -f

function get_key(k1, k2, k3, k4)     \
{                                    \
  if(k1 == "[")                      \
  {                                  \
    key = k1","k2","k3","k4"";       \
  }                                  \
  else                               \
  {                                  \
    key = k1","k2","k3"";            \
  }                                  \
  return key;                        \
}                                    \
                                     \
BEGIN                                \
{                                    \
}                                    \
(FNR==NR)                            \
{                                    \
  key = get_key(, , , );     \
  a[key] = a[key] + 1;               \
}                                    \
(FNR!=NR)                            \
{                                    \
  key = get_key(, , , );     \
  if(a[key] == 1)                    \
  {                                  \
    printf [=10=]"\n";                   \
  }                                  \
}                                    \

输出

$ ./filter.awk input.txt input.txt 
10:25:18.814763 [INFO] eceb [ 41] 60 1247 0 38490 163 715
10:25:19.844738 [INFO] eceb [ 41] 60 1248 0 38490 163 715