如何将制表符分隔的文本文件的文件路径存储在 bash 数组中

How to store file paths from a tab separated text file in a bash array

我有一个制表符分隔的文本文件,其中有一列文件路径,例如table.txt

> SampleID  Factor  Condition   Replicate   Treatment   Type    Dataset isPE    ReadLength  isREF   PathFASTQ
> DG13  fd3 c1  1   cc  0   0102    0   50  1   "/path/to/fastq"
> DG14  fd3 c1 1    cc  1   0102    0   50  1   "/path/to/fastq"

我想将路径存储在 bash 数组中,以便我可以在下游并行计算(SGE 任务数组)中使用它们。为简单起见,前导和尾随 " 可以很容易地不包含在 table.txt.

排除 header 行,我尝试了以下操作:

files=($(awk '{ if(( == 0)) { print } }' table.txt ))    
paths=($(awk '{ if(( == 0)) { print } }' table.txt ))
infile="${paths[$SGE_TASK_ID]}"/"${files[$SGE_TASK_ID]}".fastq.gz

$SGE_TASK_ID 取 (1-N) 之间的 user-defined 整数值以防有人不知道。

不幸的是 $infile 没有显示 $SGE_TASK_ID=1 的预期值:

/path/to/fastq/DG13.fastq.gz

感谢您的帮助。

能否请您尝试以下操作,此代码将删除代码 运行 期间的控制 M 字符。

myarr=($(awk '{gsub(/\r/,"")} match($NF,/\/[^"]*/){\
         val=substr($NF,RSTART,RLENGTH);\
         num=split(val,array,"/");\
         print val"/""."array[num]".gz"}'  Input_file))
for i in "${myarr[@]}"
do
  echo $i
done

如果您想从 Input_file 本身中删除控制 M 字符,那么也可以尝试 运行ning 以下内容:

tr -d '\r' < Input_file > temp && mv temp Input_file

当我们如上所示循环打印数组时,输出如下。

/path/to/fastq/DG13.fastq.gz
/path/to/fastq/DG14.fastq.gz

awk代码解释:

awk '                                 ##Starting awk program from here.
match($NF,/\/[^"]*/){                 ##Using match function of awk program here, match everything till " in last field.
  val=substr($NF,RSTART,RLENGTH)      ##Creating variable val which is sub-string where starting point is RSTART till value of RLENGTH.
  num=split(val,array,"/")            ##Creating variable num whose value is number of elements plitted by split, splitting val into array with / is delimiter.
  print val"/""."array[num]".gz"    ##Printing val / first field DOT array last element then .gz here.
}
'  Input_file                         ##Mentioning Input_file name here.

请您尝试以下操作:

while read -r -a ary; do
    ((nr++)) || continue                # skip header line
    if (( ${ary[7]} == 0 )); then       # if "isPE" == 0 ..
        path=${ary[10]#\"}              # remove leading double-quote
        path=${path%\"}                 # remove trailing double-quote
        file=${ary[0]}
        infile[$((++SGE_TASK_ID))]="${path}/${file}.fastq.gz"
    fi
done < table.txt

echo "${infile[1]}"
echo "${infile[2]}"

输出:

/path/to/fastq/DG13.fastq.gz
/path/to/fastq/DG14.fastq.gz