bash grep 用于特殊字符的混合，其中一些要按字面解释

Question

我有 data.txt 具有以下格式

blah<TAB>string1_with_spaces_quotes_dots_etc<TAB>blah
blah<TAB>string2_with_spaces_quotes_dots_etc<TAB>blah
...

有些 stringJ_... 出现不止一次。该文件未以任何方式排序。

我还有 strings.txt 的形式是

stringA_with_spaces_quotes_dots_etc
stringC_with_spaces_quotes_dots_etc
stringB_with_spaces_quotes_dots_etc
...

这些字符串只出现一次，但此文件也未排序。

我需要的是，对于来自 strings.txt 的每个字符串，在 data.txt 中找到行，其中中间字符串恰好来自 strings.txt. 因此，例如，如果我要查找的字符串是

foo.

然后我需要提取以下几行

blah<TAB>foo.<TAB>blah

但不像

这样的行

blah<TAB>foo. bar<TAB>blah
blah<TAB>foo<TAB>blah

这里的难点是那些字符串可以有像点这样的字符，可以解释为特殊字符，而我需要文字匹配。

下面循环中正确的 grep 选项集是什么？或者我应该完全使用不同的命令？

while read t
do
     grep <OPTIONS> "\t${t}\t" data.txt
done < strings.txt

Answer 1

同时使用 -f 和 -F 标志。

grep -f strings.txt -F data.txt

-f 将 strings.txt 的每一行视为一个单独的模式，而 -F 执行字符串匹配，而不是正则表达式匹配。

Answer 2

一旦你超越了简单的正则表达式匹配（例如，涉及针对特定 column/field 的任何事情），你需要 awk，而不是 grep:

awk -F'\t' 'NR==FNR{a[[=10=]];next}  in a' strings.txt data.txt

以上是字符串匹配，不是正则表达式匹配，所以没有 "special characters" 并且完全专注于 data.txt 的第二个制表符分隔字段的匹配，所以没有部分或其他可能的错误匹配。它只会完全匹配你想要的。

此外，任何时候您正在考虑编写 shell 循环来操作文本，请阅读 https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice 以了解一些（但不是全部）您不应该这样做的原因。

bash grep for a mix of special characters some of which to be interpreted literally