仅当列的值存在于文本文件中时才从 .anno 文件获取行

Question

我真的是脚本和堆栈的新手，所以如果我的问题很愚蠢或放错地方，我很抱歉。

我必须在 Bash 完成任务。

我有一个像这样的 DATA.anno 文件：

ID POP LOCALITY
1  Apu Italy
2  Apu Italy
3  Tir Albania
4  Tir Albania
5  Ber Germany
6  Ber Germany

我有一个 pop.txt 文件，其中包含前面文件第二列中存在的两个人口名称：

Apu
Ber

现在我想获取另一个文件，其中仅包含 pop.txt 文件中存在的人口行。在这种情况下，我要获取的输出文件如下：

ID POP LOCALITY
1  Apu Italy
2  Apu Italy
4  Ber Germany
5  Ber Germany

我试过这个脚本，但它似乎不起作用：

cat pop.txt | while read line; do grep $line DATA.anno | cut -f 2,3 >> outputfile.txt

Answer 1

能否请您尝试以下。

awk 'BEGIN{print "ID POP LOCALITY"} FNR==NR{array[[=10=]];next} ( in array)'   pop.txt data.anno

说明：添加代码的详细说明。

awk '                         ##Starting awk program from here.
BEGIN{                        ##Starting BEGIN section from here.
  print "ID POP LOCALITY"     ##Printing headers here.
}
FNR==NR{                      ##Checking condition FNR==NR which will be TRUE when first Input_fie is being read.
  array[[=11=]]                   ##Creating array with index of current line.
  next                        ##next will skip all further statements from here.
}
( in array)                 ##Checking condition if current line 2nd field is present in array then print that line.
'   pop.txt data.anno         ##Mentioning Input_file names here.

仅当列的值存在于文本文件中时才从 .anno 文件获取行

Getting a row from a .anno file only when the value of a column is present in a text file

bash

grep

bioinformatics

cat