如何在 snakemake 管道中使用 pandas

How to use pandas within snakemake pipelines

我想通过将一些代码转换为数据管道来提高我制作的一些 python 代码的可重复性。我习惯了 R 中的 targets 并且想在 Python 中找到一个等价物。我的印象是 snakemake 非常接近。

我不明白我们如何使用pandassnakemake任务中导入输入,修改它然后写output

让我们采用我能想到的最简单的管道:我们采用 csv 并在其他地方写入副本。

使用 bash 脚本时管道工作正常:

rule trying_snakemake:
    input:
        path="untitled.txt"
    output:
        "test-snakemake.csv"
    run:
        shell("cp {input.path} {output}")

我想用 pandas 的等效方法(当然这里使用 pandas 似乎没有必要,但这是为了理解逻辑):

rule trying_snakemake:
    input:
        path="untitled.txt"
    output:
        "test-snakemake.csv"
    run:
        import pandas as pd
        df = pd.read_csv({input.path})
        df.to_csv({output}, header=False)
snakemake -c1
Invalid file path or buffer object type: <class 'set'>
  File "/home/jovyan/work/label-openfood/Snakefile", line 19, in __rule_trying_snakemake
  File "/opt/conda/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
  File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
  File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 482, in _read
  File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
  File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
  File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 51, in __init__
  File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py", line 222, in _open_handles
  File "/opt/conda/lib/python3.9/site-packages/pandas/io/common.py", line 609, in get_handle
  File "/opt/conda/lib/python3.9/site-packages/pandas/io/common.py", line 396, in _get_filepath_or_buffer
  File "/opt/conda/lib/python3.9/concurrent/futures/thread.py", line 52, in run
Exiting because a job execution failed. Look above for error message

我认为错误出现在 read_csv 步,但我不明白这是什么意思(我已经习惯了 pandas 的情况)

你非常接近,run 指令中不需要花括号:

rule trying_snakemake:
    input:
        path="untitled.txt"
    output:
        csv="test-snakemake.csv"
    run:
        import pandas as pd
        df = pd.read_csv(input.path)
        df.to_csv(output.csv, header=False)