如何在 snakemake 管道中使用 pandas
How to use pandas within snakemake pipelines
我想通过将一些代码转换为数据管道来提高我制作的一些 python 代码的可重复性。我习惯了 R
中的 targets
并且想在 Python
中找到一个等价物。我的印象是 snakemake
非常接近。
我不明白我们如何使用pandas
在snakemake
任务中导入输入,修改它然后写output
。
让我们采用我能想到的最简单的管道:我们采用 csv 并在其他地方写入副本。
使用 bash 脚本时管道工作正常:
rule trying_snakemake:
input:
path="untitled.txt"
output:
"test-snakemake.csv"
run:
shell("cp {input.path} {output}")
我想用 pandas
的等效方法(当然这里使用 pandas
似乎没有必要,但这是为了理解逻辑):
rule trying_snakemake:
input:
path="untitled.txt"
output:
"test-snakemake.csv"
run:
import pandas as pd
df = pd.read_csv({input.path})
df.to_csv({output}, header=False)
snakemake -c1
Invalid file path or buffer object type: <class 'set'>
File "/home/jovyan/work/label-openfood/Snakefile", line 19, in __rule_trying_snakemake
File "/opt/conda/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 482, in _read
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 51, in __init__
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py", line 222, in _open_handles
File "/opt/conda/lib/python3.9/site-packages/pandas/io/common.py", line 609, in get_handle
File "/opt/conda/lib/python3.9/site-packages/pandas/io/common.py", line 396, in _get_filepath_or_buffer
File "/opt/conda/lib/python3.9/concurrent/futures/thread.py", line 52, in run
Exiting because a job execution failed. Look above for error message
我认为错误出现在 read_csv
步,但我不明白这是什么意思(我已经习惯了 pandas
的情况)
你非常接近,run
指令中不需要花括号:
rule trying_snakemake:
input:
path="untitled.txt"
output:
csv="test-snakemake.csv"
run:
import pandas as pd
df = pd.read_csv(input.path)
df.to_csv(output.csv, header=False)
我想通过将一些代码转换为数据管道来提高我制作的一些 python 代码的可重复性。我习惯了 R
中的 targets
并且想在 Python
中找到一个等价物。我的印象是 snakemake
非常接近。
我不明白我们如何使用pandas
在snakemake
任务中导入输入,修改它然后写output
。
让我们采用我能想到的最简单的管道:我们采用 csv 并在其他地方写入副本。
使用 bash 脚本时管道工作正常:
rule trying_snakemake:
input:
path="untitled.txt"
output:
"test-snakemake.csv"
run:
shell("cp {input.path} {output}")
我想用 pandas
的等效方法(当然这里使用 pandas
似乎没有必要,但这是为了理解逻辑):
rule trying_snakemake:
input:
path="untitled.txt"
output:
"test-snakemake.csv"
run:
import pandas as pd
df = pd.read_csv({input.path})
df.to_csv({output}, header=False)
snakemake -c1
Invalid file path or buffer object type: <class 'set'>
File "/home/jovyan/work/label-openfood/Snakefile", line 19, in __rule_trying_snakemake
File "/opt/conda/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 482, in _read
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 811, in __init__
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 51, in __init__
File "/opt/conda/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py", line 222, in _open_handles
File "/opt/conda/lib/python3.9/site-packages/pandas/io/common.py", line 609, in get_handle
File "/opt/conda/lib/python3.9/site-packages/pandas/io/common.py", line 396, in _get_filepath_or_buffer
File "/opt/conda/lib/python3.9/concurrent/futures/thread.py", line 52, in run
Exiting because a job execution failed. Look above for error message
我认为错误出现在 read_csv
步,但我不明白这是什么意思(我已经习惯了 pandas
的情况)
你非常接近,run
指令中不需要花括号:
rule trying_snakemake:
input:
path="untitled.txt"
output:
csv="test-snakemake.csv"
run:
import pandas as pd
df = pd.read_csv(input.path)
df.to_csv(output.csv, header=False)