如何用awk进行严格匹配

Question

我正在用一个文件查询另一个文件，它们如下所示：

文件 1:

Angela S Darvill| text text text text   
Helen Stanley| text text text text   
Carol Haigh S|text text text text .....

文件 2：

Carol Haigh  
Helen Stanley  
Angela Darvill

这个命令：

awk 'NR==FNR{_[];next} ( in _)' File2.txt File1.txt

returns 行重叠，但没有严格匹配。严格匹配，应该只返回海伦斯坦利。

如何限制 awk 的严格重叠？

Answer 1

使用您显示的示例，请尝试执行以下操作。你在正确的轨道上，你需要做两件事，第一：在读取 file2.txt 时将整行作为数组 a 中的索引，并在 [=14= 之前将字段 seapeator 设置为 | ] 开始读取文件 1

awk -F'|' 'NR==FNR{a[[=10=]];next}  in a' File2.txt File1.txt

上面的命令对我不起作用（我在Mac，不知道是否重要），但是

awk 'NR==FNR{_[[=11=]];next} ( in _)' File2.txt. FS="|" File1.txt

工作得很好

Answer 2

您还可以使用 grep 从 File2.txt 匹配为一个正则表达式列表以进行精确匹配。

您可以使用sed来准备比赛。这是一个例子：

sed -E 's/[ \t]*$//; s/^(.*)$/^|/' File2.txt
^Carol Haigh|
^Helen Stanley|
^Angela Darvill|
...

然后使用带有 sed 的进程作为 grep 的 -f 参数：

grep -f <(sed -E 's/[ \t]*$//; s/^(.*)$/^|/' File2.txt) File1.txt
Helen Stanley| text text text text

由于您的示例 File2.txt 具有尾随空格，因此 sed 具有 s/[ \t]*$//; 作为第一个替换。如果您的实际文件没有这些交易空间，您可以这样做：

grep -f <(sed -E 's/.*/^&|/' File2.txt) File1.txt

Ed Morton 提出了一个很好的观点，即 grep 仍将解释 File2.txt 中的 RE meta-characters。您可以使用标志 -F 因此仅使用文字字符串：

grep -F -f <(sed -E 's/.*/&|/' File2.txt) File1.txt

How to make a strict match with awk