如何将制表符分隔的文本文件的文件路径存储在 bash 数组中
How to store file paths from a tab separated text file in a bash array
我有一个制表符分隔的文本文件,其中有一列文件路径,例如table.txt
> SampleID Factor Condition Replicate Treatment Type Dataset isPE ReadLength isREF PathFASTQ
> DG13 fd3 c1 1 cc 0 0102 0 50 1 "/path/to/fastq"
> DG14 fd3 c1 1 cc 1 0102 0 50 1 "/path/to/fastq"
我想将路径存储在 bash 数组中,以便我可以在下游并行计算(SGE 任务数组)中使用它们。为简单起见,前导和尾随 "
可以很容易地不包含在 table.txt
.
中
排除 header 行,我尝试了以下操作:
files=($(awk '{ if(( == 0)) { print } }' table.txt ))
paths=($(awk '{ if(( == 0)) { print } }' table.txt ))
infile="${paths[$SGE_TASK_ID]}"/"${files[$SGE_TASK_ID]}".fastq.gz
$SGE_TASK_ID
取 (1-N) 之间的 user-defined 整数值以防有人不知道。
不幸的是 $infile
没有显示 $SGE_TASK_ID=1
的预期值:
/path/to/fastq/DG13.fastq.gz
感谢您的帮助。
能否请您尝试以下操作,此代码将删除代码 运行 期间的控制 M 字符。
myarr=($(awk '{gsub(/\r/,"")} match($NF,/\/[^"]*/){\
val=substr($NF,RSTART,RLENGTH);\
num=split(val,array,"/");\
print val"/""."array[num]".gz"}' Input_file))
for i in "${myarr[@]}"
do
echo $i
done
如果您想从 Input_file 本身中删除控制 M 字符,那么也可以尝试 运行ning 以下内容:
tr -d '\r' < Input_file > temp && mv temp Input_file
当我们如上所示循环打印数组时,输出如下。
/path/to/fastq/DG13.fastq.gz
/path/to/fastq/DG14.fastq.gz
awk
代码解释:
awk ' ##Starting awk program from here.
match($NF,/\/[^"]*/){ ##Using match function of awk program here, match everything till " in last field.
val=substr($NF,RSTART,RLENGTH) ##Creating variable val which is sub-string where starting point is RSTART till value of RLENGTH.
num=split(val,array,"/") ##Creating variable num whose value is number of elements plitted by split, splitting val into array with / is delimiter.
print val"/""."array[num]".gz" ##Printing val / first field DOT array last element then .gz here.
}
' Input_file ##Mentioning Input_file name here.
请您尝试以下操作:
while read -r -a ary; do
((nr++)) || continue # skip header line
if (( ${ary[7]} == 0 )); then # if "isPE" == 0 ..
path=${ary[10]#\"} # remove leading double-quote
path=${path%\"} # remove trailing double-quote
file=${ary[0]}
infile[$((++SGE_TASK_ID))]="${path}/${file}.fastq.gz"
fi
done < table.txt
echo "${infile[1]}"
echo "${infile[2]}"
输出:
/path/to/fastq/DG13.fastq.gz
/path/to/fastq/DG14.fastq.gz
我有一个制表符分隔的文本文件,其中有一列文件路径,例如table.txt
> SampleID Factor Condition Replicate Treatment Type Dataset isPE ReadLength isREF PathFASTQ
> DG13 fd3 c1 1 cc 0 0102 0 50 1 "/path/to/fastq"
> DG14 fd3 c1 1 cc 1 0102 0 50 1 "/path/to/fastq"
我想将路径存储在 bash 数组中,以便我可以在下游并行计算(SGE 任务数组)中使用它们。为简单起见,前导和尾随 "
可以很容易地不包含在 table.txt
.
排除 header 行,我尝试了以下操作:
files=($(awk '{ if(( == 0)) { print } }' table.txt ))
paths=($(awk '{ if(( == 0)) { print } }' table.txt ))
infile="${paths[$SGE_TASK_ID]}"/"${files[$SGE_TASK_ID]}".fastq.gz
$SGE_TASK_ID
取 (1-N) 之间的 user-defined 整数值以防有人不知道。
不幸的是 $infile
没有显示 $SGE_TASK_ID=1
的预期值:
/path/to/fastq/DG13.fastq.gz
感谢您的帮助。
能否请您尝试以下操作,此代码将删除代码 运行 期间的控制 M 字符。
myarr=($(awk '{gsub(/\r/,"")} match($NF,/\/[^"]*/){\
val=substr($NF,RSTART,RLENGTH);\
num=split(val,array,"/");\
print val"/""."array[num]".gz"}' Input_file))
for i in "${myarr[@]}"
do
echo $i
done
如果您想从 Input_file 本身中删除控制 M 字符,那么也可以尝试 运行ning 以下内容:
tr -d '\r' < Input_file > temp && mv temp Input_file
当我们如上所示循环打印数组时,输出如下。
/path/to/fastq/DG13.fastq.gz
/path/to/fastq/DG14.fastq.gz
awk
代码解释:
awk ' ##Starting awk program from here.
match($NF,/\/[^"]*/){ ##Using match function of awk program here, match everything till " in last field.
val=substr($NF,RSTART,RLENGTH) ##Creating variable val which is sub-string where starting point is RSTART till value of RLENGTH.
num=split(val,array,"/") ##Creating variable num whose value is number of elements plitted by split, splitting val into array with / is delimiter.
print val"/""."array[num]".gz" ##Printing val / first field DOT array last element then .gz here.
}
' Input_file ##Mentioning Input_file name here.
请您尝试以下操作:
while read -r -a ary; do
((nr++)) || continue # skip header line
if (( ${ary[7]} == 0 )); then # if "isPE" == 0 ..
path=${ary[10]#\"} # remove leading double-quote
path=${path%\"} # remove trailing double-quote
file=${ary[0]}
infile[$((++SGE_TASK_ID))]="${path}/${file}.fastq.gz"
fi
done < table.txt
echo "${infile[1]}"
echo "${infile[2]}"
输出:
/path/to/fastq/DG13.fastq.gz
/path/to/fastq/DG14.fastq.gz