生成多个带有通配符的文件，然后合并为一个

Question

我的 Snakefile 有两条规则：一条使用通配符生成多组文件，另一条将所有内容合并到一个文件中。我是这样写的：

chr = range(1,23)

rule generate:
    input:
        og_files = config["tmp"] + '/chr{chr}.bgen',
    output:
        out = multiext(config["tmp"] + '/plink/chr{{chr}}',
                       '.bed', '.bim', '.fam')
    shell:
        """
        plink \
        --bgen {input.og_files} \
        --make-bed \
        --oxford-single-chr \
        --out {config[tmp]}/plink/chr{chr}
        """
rule merge:
    input:
        plink_chr = expand(config["tmp"] + '/plink/chr{chr}.{ext}',
                           chr = chr,
                           ext = ['bed', 'bim', 'fam'])
    output:
        out = multiext(config["tmp"] + '/all',
                       '.bed', '.bim', '.fam')
    shell:
        """
        plink \
        --pmerge-list-dir {config[tmp]}/plink \
        --make-bed \
        --out {config[tmp]}/all
        """

不幸的是，这不允许我跟踪从第一条规则到第二条规则的文件：

$ snakemake -s myfile.smk -c1 -np                                                                           
Building DAG of jobs...                                                                                                                                       
MissingInputException in line 17 of myfile.smk:                            
Missing input files for rule merge: 
[list of all the files made by expand()]

我可以用什么在generate中生成通配符chr的22组文件，但又能在merge的输入中跟踪它们？预先感谢您的帮助

Answer 1

在规则 generate 中，我认为您不想转义 {chr} 通配符，否则它不会被替换。即：

        out = multiext(config["tmp"] + '/plink/chr{{chr}}',
                       '.bed', '.bim', '.fam')

应该是：

        out = multiext(config["tmp"] + '/plink/chr{chr}',
                       '.bed', '.bim', '.fam')

生成多个带有通配符的文件，然后合并为一个

Generate many files with wildcard, then merge into one

merge

wildcard-expansion

snakemake