比较两个不同长度的文件之间的公共值 (Linux)

Question

我比较了两个不同长度的文件，我首先找到了文件 1 中存在但文件 2 中没有的唯一 ID。

但是，现在我想找到文件之间的共同唯一值，我已经多次看到 comm 命令的使用，但是，这些文件的长度不同。

例子

文件 1:

文件 2:

期望输出：

为了寻找独特的差异，我使用了以下命令：

awk 'FNR==NR {a[[=14=]]++; next} !([=14=] in a)' file1.sorted file2.sorted > diff_values.txt

为了寻找共同价值观，我尝试使用以下命令，但我不完全确定这是否是正确的方法，或者是否存在任何替代方法：

comm -12 file1.sorted file2.sorted > comm_values.txt

Answer 1

使用 comm 有很多替代方法，就像在 Unix 中做任何事情都有很多替代方法一样，但是 comm 是专为满足您的要求而设计的工具。

常用线路：

$ comm -12 <(sort file1) <(sort file2)
2
4
6
8

不同行：

$ comm -3 <(sort file1) <(sort file2)
        1
10
        3
        5
        7
        9

仅在第一个文件中的行数：

$ comm -23 <(sort file1) <(sort file2)
10

仅在第二个文件中的行：

$ comm -13 <(sort file1) <(sort file2)
1
3
5
7
9

如果您想要替代方案，这里有一些您可以考虑并根据您的需要进行调整的替代脚本：

$ awk 'NR==FNR{a[[=14=]]; c[[=14=]]; next} {b[[=14=]]; c[[=14=]]} END{for (i in c) if ((i in a) && (i in b)) print i}' file1 file2
2
4
6
8

$ awk 'NR==FNR{a[[=14=]]; c[[=14=]]; next} {b[[=14=]]; c[[=14=]]} END{for (i in c) if (!((i in a) && (i in b))) print i}' file1 file2
1
3
5
7
9
10

$ awk 'NR==FNR{a[[=14=]]; c[[=14=]]; next} {b[[=14=]]; c[[=14=]]} END{for (i in c) if ((i in a) && !(i in b)) print i}' file1 file2
10

$ awk 'NR==FNR{a[[=14=]]; c[[=14=]]; next} {b[[=14=]]; c[[=14=]]} END{for (i in c) if (!(i in a) && (i in b)) print i}' file1 file2
1
3
5
7
9

比较两个不同长度的文件之间的公共值 (Linux)

Comparing Common Values between two files of different length (Linux)

awk

text-processing

例子