Snakemake 规则从输入变量写入一个新的文本文件(Snakemake 语法)
Snakemake rule to write a new text file from input variables (Snakemake syntax)
我有一个功能齐全的 Snakemake 工作流程,但我想添加一个规则,将输入变量作为新行写出到新生成的输出文本文件中。简要总结一下,我在下面包含了相关代码:
OUTPUTDIR = config["outputDIR"]
SAMPLEID = list(SAMPLE_TABLE.Sample_Name)
# Above 2 lines are functional in other parts of script.
rule all:
input:
manifest = OUTPUTDIR + "/manifest.txt"
rule write_manifest:
input:
sampleid = SAMPLEID,
loc_r1 = expand("{base}/trimmed/{sample}_1.trimmed.fastq.gz", base = OUTPUTDIR, sample = SAMPLELIST),
loc_r2 = expand("{base}/trimmed/{sample}_2.trimmed.fastq.gz", base = OUTPUTDIR, sample = SAMPLELIST)
output:
OUTPUTDIR + "/manifest.txt"
shell:
"""
echo "{input.sampleid},{input.loc_r1},forward" >> {output}
echo "{input.sampleid},{input.loc_r2},reverse" >> {output}
"""
我的问题是 Snakemake 正在读取文件,我需要它来打印它检测到的文件路径或样本 ID。
帮助语法?
所需的输出文件需要如下所示:
depth1,$PWD/raw_seqs_dir/Test01_full_L001_R1_001.fastq.gz,forward
depth1,$PWD/raw_seqs_dir/Test01_full_L001_R2_001.fastq.gz,reverse
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R1_001.fastq.gz,forward
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R2_001.fastq.gz,reverse
正在尝试使用 echo 写入。
错误信息:
Building DAG of jobs...
MissingInputException in [write_manifest]:
Missing input files for rule write_manifest:
sample1
sample2
sample3
更新:
通过将 sampleid 添加到 params:
rule write_manifest:
input:
loc_r1 = expand("{base}/trimmed/{sample}_{suf}_1.trimmed.fastq.gz", base = SCRATCHDIR, sample = SAMPLE$
loc_r2 = expand("{base}/trimmed/{sample}_{suf}_2.trimmed.fastq.gz", base = SCRATCHDIR, sample = SAMPLE$
output:
OUTPUTDIR + "/manifest.txt"
params:
sampleid = SAMPLEID
shell:
"""
echo "{params.sampleid},{input.loc_r1},forward" >> {output}
echo "{params.sampleid},{input.loc_r2},reverse" >> {output}
"""
我的输出看起来像这样(这是不正确的)
sample1 sample2 sample3,$PWD/tmp/dir/sample1.fastq $PWD/tmp/dir/sample2.fastq $PWD/tmp/dir/sample3.fastq,forward
sample1 sample2 sample3,$PWD/tmp/dir/sample1.fastq $PWD/tmp/dir/sample2.fastq $PWD/tmp/dir/sample3.fastq,reverse
这仍然不是我想要的,我需要它看起来像下面期望的输出。我可以把它写成让 Snakemake 循环遍历每个 sample/input/params 吗?
所需的输出文件需要如下所示:
depth1,$PWD/raw_seqs_dir/Test01_full_L001_R1_001.fastq.gz,forward
depth1,$PWD/raw_seqs_dir/Test01_full_L001_R2_001.fastq.gz,reverse
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R1_001.fastq.gz,forward
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R2_001.fastq.gz,reverse
您需要在参数中使用通配符 sample
而不是变量 SAMPLEID
。这将在执行时使用特定于该规则的正确样本 ID。
params:
sample = '{sample}'
shell:
"""
echo "{params.sample},{input.loc_r1},forward" >> {output}
echo "{params.sample},{input.loc_r2},reverse" >> {output}
"""
我有一个功能齐全的 Snakemake 工作流程,但我想添加一个规则,将输入变量作为新行写出到新生成的输出文本文件中。简要总结一下,我在下面包含了相关代码:
OUTPUTDIR = config["outputDIR"]
SAMPLEID = list(SAMPLE_TABLE.Sample_Name)
# Above 2 lines are functional in other parts of script.
rule all:
input:
manifest = OUTPUTDIR + "/manifest.txt"
rule write_manifest:
input:
sampleid = SAMPLEID,
loc_r1 = expand("{base}/trimmed/{sample}_1.trimmed.fastq.gz", base = OUTPUTDIR, sample = SAMPLELIST),
loc_r2 = expand("{base}/trimmed/{sample}_2.trimmed.fastq.gz", base = OUTPUTDIR, sample = SAMPLELIST)
output:
OUTPUTDIR + "/manifest.txt"
shell:
"""
echo "{input.sampleid},{input.loc_r1},forward" >> {output}
echo "{input.sampleid},{input.loc_r2},reverse" >> {output}
"""
我的问题是 Snakemake 正在读取文件,我需要它来打印它检测到的文件路径或样本 ID。 帮助语法?
所需的输出文件需要如下所示:
depth1,$PWD/raw_seqs_dir/Test01_full_L001_R1_001.fastq.gz,forward
depth1,$PWD/raw_seqs_dir/Test01_full_L001_R2_001.fastq.gz,reverse
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R1_001.fastq.gz,forward
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R2_001.fastq.gz,reverse
正在尝试使用 echo 写入。
错误信息:
Building DAG of jobs...
MissingInputException in [write_manifest]:
Missing input files for rule write_manifest:
sample1
sample2
sample3
更新: 通过将 sampleid 添加到 params:
rule write_manifest:
input:
loc_r1 = expand("{base}/trimmed/{sample}_{suf}_1.trimmed.fastq.gz", base = SCRATCHDIR, sample = SAMPLE$
loc_r2 = expand("{base}/trimmed/{sample}_{suf}_2.trimmed.fastq.gz", base = SCRATCHDIR, sample = SAMPLE$
output:
OUTPUTDIR + "/manifest.txt"
params:
sampleid = SAMPLEID
shell:
"""
echo "{params.sampleid},{input.loc_r1},forward" >> {output}
echo "{params.sampleid},{input.loc_r2},reverse" >> {output}
"""
我的输出看起来像这样(这是不正确的)
sample1 sample2 sample3,$PWD/tmp/dir/sample1.fastq $PWD/tmp/dir/sample2.fastq $PWD/tmp/dir/sample3.fastq,forward
sample1 sample2 sample3,$PWD/tmp/dir/sample1.fastq $PWD/tmp/dir/sample2.fastq $PWD/tmp/dir/sample3.fastq,reverse
这仍然不是我想要的,我需要它看起来像下面期望的输出。我可以把它写成让 Snakemake 循环遍历每个 sample/input/params 吗? 所需的输出文件需要如下所示:
depth1,$PWD/raw_seqs_dir/Test01_full_L001_R1_001.fastq.gz,forward
depth1,$PWD/raw_seqs_dir/Test01_full_L001_R2_001.fastq.gz,reverse
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R1_001.fastq.gz,forward
depth2,$PWD/raw_seqs_dir/Test02_full_L001_R2_001.fastq.gz,reverse
您需要在参数中使用通配符 sample
而不是变量 SAMPLEID
。这将在执行时使用特定于该规则的正确样本 ID。
params:
sample = '{sample}'
shell:
"""
echo "{params.sample},{input.loc_r1},forward" >> {output}
echo "{params.sample},{input.loc_r2},reverse" >> {output}
"""