在特定列子集中匹配后删除文件中的重复行
Delete duplicate lines in a file after a match in a particular subset of columns
我有一个未排序的文件,其中包含分布在许多列中的行数据,如本例所示:
10:23:55.521803 [INFO] eceb [ 41] 235 870 1 26601 349 910
10:24:11.771454 [INFO] eceb [ 41] 41 870 0 26601 349 910
10:25:18.858675 [INFO] eceb [ 41] 235 870 3 26601 349 910
10:25:18.814763 [INFO] eceb [ 41] 60 1247 0 38490 163 715
10:25:19.844738 [INFO] eceb [ 41] 60 1248 0 38490 163 715
10:24:11.771454 [INFO] eceb [ 41] 41 870 0 26641 389 920
我想找出所有相同的行,仅考虑第 4,5 和 6 列,并从文件中删除所有这些行。
因此,在这个例子中,结果应该是:
10:25:18.814763 [INFO] eceb [ 41] 60 1247 0 38490 163 715
10:25:19.844738 [INFO] eceb [ 41] 60 1248 0 38490 163 715
我该怎么做?
计划
- read the file and construct an key mapping to count occurrences of key fields
- reread file only printing the records with occurences equal to one
filter.awk
#!/usr/bin/awk -f
function get_key(k1, k2, k3, k4) \
{ \
if(k1 == "[") \
{ \
key = k1","k2","k3","k4""; \
} \
else \
{ \
key = k1","k2","k3""; \
} \
return key; \
} \
\
BEGIN \
{ \
} \
(FNR==NR) \
{ \
key = get_key(, , , ); \
a[key] = a[key] + 1; \
} \
(FNR!=NR) \
{ \
key = get_key(, , , ); \
if(a[key] == 1) \
{ \
printf [=10=]"\n"; \
} \
} \
输出
$ ./filter.awk input.txt input.txt
10:25:18.814763 [INFO] eceb [ 41] 60 1247 0 38490 163 715
10:25:19.844738 [INFO] eceb [ 41] 60 1248 0 38490 163 715
我有一个未排序的文件,其中包含分布在许多列中的行数据,如本例所示:
10:23:55.521803 [INFO] eceb [ 41] 235 870 1 26601 349 910
10:24:11.771454 [INFO] eceb [ 41] 41 870 0 26601 349 910
10:25:18.858675 [INFO] eceb [ 41] 235 870 3 26601 349 910
10:25:18.814763 [INFO] eceb [ 41] 60 1247 0 38490 163 715
10:25:19.844738 [INFO] eceb [ 41] 60 1248 0 38490 163 715
10:24:11.771454 [INFO] eceb [ 41] 41 870 0 26641 389 920
我想找出所有相同的行,仅考虑第 4,5 和 6 列,并从文件中删除所有这些行。
因此,在这个例子中,结果应该是:
10:25:18.814763 [INFO] eceb [ 41] 60 1247 0 38490 163 715
10:25:19.844738 [INFO] eceb [ 41] 60 1248 0 38490 163 715
我该怎么做?
计划
- read the file and construct an key mapping to count occurrences of key fields
- reread file only printing the records with occurences equal to one
filter.awk
#!/usr/bin/awk -f
function get_key(k1, k2, k3, k4) \
{ \
if(k1 == "[") \
{ \
key = k1","k2","k3","k4""; \
} \
else \
{ \
key = k1","k2","k3""; \
} \
return key; \
} \
\
BEGIN \
{ \
} \
(FNR==NR) \
{ \
key = get_key(, , , ); \
a[key] = a[key] + 1; \
} \
(FNR!=NR) \
{ \
key = get_key(, , , ); \
if(a[key] == 1) \
{ \
printf [=10=]"\n"; \
} \
} \
输出
$ ./filter.awk input.txt input.txt
10:25:18.814763 [INFO] eceb [ 41] 60 1247 0 38490 163 715
10:25:19.844738 [INFO] eceb [ 41] 60 1248 0 38490 163 715