当在所有文件中找到模式时循环中断？

Question

下面的代码在多个文件中搜索一组模式（包含在 $snps 变量中）（$file 变量用于以 snp_search.txt 结尾的文件）并输出一长串是否每个 snp在每个文件中。

目的是找到所有个文件中的几个 SNP。

有没有办法将下面的代码嵌入 while 循环中，以便它保持运行直到它找到一个存在于所有文件中的 SNP 并在它找到时中断？否则我必须手动检查日志文件。

for snp in $snplist; do
   for file in *snp_search.txt; do

     if grep -wq "$snp" $file; then
       echo "${snp} was found in $file" >> ${date}_snp_search.log; else
       echo "${snp} was NOT found in $file" >> ${date}_snp_search.log
     fi
   done
done

Answer 1

您可以使用grep搜索所有文件。如果文件名不包含换行符，直接统计匹配文件数即可：

#! /bin/bash
files=(*snp_search.txt)
count_files=${#files[@]}
for snp in $snplist ; do
    count=$(grep -wl "$snp" *snp_search.txt | wc -l)
    if ((count == count_files)) ; then
        break
    fi
done

对于包含换行符的文件名，可以输出每个没有文件名的$snp的第一个匹配行，并计算行数：

count=$(grep -m1 -hw "$snp" *snp_search.txt | wc -l)

Answer 2

假设：

输入文件的一行中可能存在多个 SNP
将打印存在于 all 文件中的 all SNP 列表（OP 提到了矛盾的陈述：find several SNPs that are in all of the files vs break when one SNP is found in all files)

示例输入（如果 OP 使用示例数据更新问题，将会更新）：

$ cat snp.dat
ABC
DEF
XYZZ

$ cat 1.snp.search.txt

ABCD-XABC
someABC_stuff
ABC-
de-ABC-
de-ABC
DEFG
zDEFG
.DEF-xyz
abc-DEF
abc-DEF-ABC-xyz

$ cat 2.snp.search.txt
ABC

一个 GNU awk 需要通过每个输入文件一次的想法：

awk '
FNR==NR { snps[]=0; next }                        # load 1st file into array; initialize counter (of files containing this snp) to 0

FNR==1  { filecount++                               # 1st line of 2nd-nth files: increment counter of number of filds
          delete to_find                            # delete our to_find[] array
          for (snp in snps)                         # make a copy of our master snps[] array ...
               to_find[snp]                         # storing copy in to_find[] array
        }

        { for (snp in to_find) {                    # loop through list of snps 
              if ([=11=] ~ "\y" snp "\y") {           # if current line contains a "word" match on the current snp ...
                 snps[snp]++                        # increment our snp counter (ie, number of files containing this snp)
                 delete to_find[snp]                # no longer need to search current file for this particular snp
#                break                              # if line can only contain 1 snp then uncomment this line
              }
          }

          for (snp in to_find)                      # if we still have an snp to find then ...
              next                                  # skip to next line else ...
          nextfile                                  # skip to next file
        }

END     { PROCINFO["sorted_in"]="@ind_str_asc"
          for (snp in snps)
              if (snps[snp] == filecount)
                 printf "The SNP %s was found in all files\n", snp
        }
' snp.dat *.snp.search.txt

备注：

GNU awk 是 PROCINFO["sorted_in"]="@ind_str_asc" 选项对 snps[] 数组索引排序所必需的；如果 GNU awk 不可用，或者输出消息的顺序不重要，则可以从代码中删除此命令
因为我们只处理每个输入文件一次，所以我们将打印所有个显示在所有文件中的 SNP（即，我们不会'在我们处理完最后一个文件之前，我们不知道所有文件中是否存在 SNP，因此不妨打印所有字段中存在的所有 SNP）
应该比需要对每个输入文件进行多次扫描的过程更快（尤其是对于较大的文件 and/or 大量 SNP）

这会生成：

The SNP ABC was found in all files

当在所有文件中找到模式时循环中断？

While loop to break when pattern is found in all files?

linux

bash

loops

for-loop

while-loop