我如何编写 for 循环以便程序针对一组 94 个 DNA 样本重复自身？

Question

我已经在 bash shell 中编写了一些代码（这样我就可以将其提交到我大学的超级计算机）以从我拥有的一批 DNA 提取物中编辑出污染序列。本质上，这段代码所做的是从我所做的阴性提取空白 (A1-BLANK) 中提取序列，并将其从所有其他样本中减去。

我已经想出如何让它与单个样本一起工作，但我正在尝试编写一个 for 循环，以便小块代码会为每个样本重复自己，这个文件的结果是一个 .sam 文件，每个样本都有一个唯一的名称，其中合并和编辑了样本的正向和反向读取 contamination.I 已广泛检查堆栈溢出以寻求解决此特定问题的帮助，但还没有能够将相关的已回答问题应用于我的代码。

这是我尝试为单个样本做的部分示例，名为 F10-61C-3-V4_S78_L001_R1_001.fastq：

bowtie2 -q --end-to-end --very-sensitive \ ##bowtie2 is a program that examines sequence similarity compared to a standard
-N 0 -L 31 --time --reorder \
-x A1-BlankIndex \ ##This line compares the sample to the negative extraction blank
-1  /file directory/F10-61C-3-V4_S78_L001_R1_001.fastq 
-2 /file directory/F10-61C-3-V4_S78_L001_R2_001.fastq \ ##These two lines above merge the forward and reverse reads of the DNA sequences within the individual files into one file
-S 61C-3.sam ##This line renames the merged and edited file and transforms it into a .sam file

到目前为止，这是我在流程的这一小步骤中得到的结果：


for file in /file directory/*.fastq

do

bowtie2 -q --end-to-end --very-sensitive \
-N 0 -L 31 --time --reorder \
-x A1-BlankIndex \
-1  /file directory/*.fastq 
-2 /file directory/*.fastq \
-S *.sam

done

在我生成的 slurm 文件中，我现在遇到的错误与 -S 命令有关。我不确定如何为 .sam 文件给每个合并和编辑的样本一个唯一的名称。我是在 python 中编写 for 循环的新手（我唯一的经验是在 R 中），我确信这是一个简单的修复，但我无法找到任何具体的答案题。非常感谢任何反馈！

Answer 1

这是第一次尝试。注意我假设 do 和 done 之间的整个片段是一个命令，因此需要连续标记 (\).

另请注意，在我的示例中 "$file" 出现了两次。我对此感到有些不安，但您在描述的示例中似乎明确需要它。

最后请注意，我只给 sam 文件一个数字名称，因为我真的不知道您希望这个名称是什么。

我希望这提供了足够的信息来帮助您入门。

#!/bin/bash
i=0
for file in /file/directory/*.fastq
do
     bowtie2 -q --end-to-end --very-sensitive \
      -N 0 -L 31 --time --reorder \
      -x A1-BlankIndex \
      -1 "$file"  \
      -2 "$file" \
      -S "$i".sam
      i=$((i+1))
done

Answer 2

这可能会作为您的示例，但会自动 select 使用 RegEx 的输出文件名引用：

#!/usr/bin/env bash

input_samples='/input_samples_directory'
output_samples='/output_merged_samples_directory'

while IFS= read -r -d '' R1_fastq; do
  # Deduce R2 sample from R1 sample file name
  R2_fastq="${R1_fastq/_R1_/_R2_}"
  # RegEx match capture group in () for the output sample reference
  [[ $R1_fastq =~ [^-]+-([[:digit:]]+[[:alpha:]]-[[:digit:]]).* ]]
  # Construct the output sample file path with the captured referrenced
  # from the RegEx above
  sam="$output_samples/${BASH_REMATCH[1]}.sam"
  # Perform the merging
  bowtie2 -q --end-to-end --very-sensitive \
    -N 0 -L 31 --time --reorder \
    -x A1-BlankIndex \
    -1 "$R1_fastq" \
    -2 "$R2_fastq" \
    -S "$sam"
done < <(find "$input_samples" -maxdepth 1 -type -f -name '*_R1_*.fastq' -print0)

我如何编写 for 循环以便程序针对一组 94 个 DNA 样本重复自身？

How do I write a for-loop so a program reiterates itself for a set of 94 DNA samples?

python

bash

for-loop

sample-data