在通过 SLURM 计算期间用户定义的作业类型不可用

User defined job type not available during calculation via SLURM

我正在尝试设置 pyiron 计算(版本 0.3.6.)。我想通过 SLURM 在计算机集群上执行非 python 脚本。我已经编写了一个自己的 OwnProgramJob class,它继承自 GenericJob class。在我的本地计算机上一切 运行 都很顺利。然而,当 运行 在集群上时,我自己的 class 在 pyiron 中不可用:

...    
File "/beegfs-home/users/fufl/.local/project/lib/python3.8/site-packages/pyiron_base/generic/hdfio.py", line 1251, in import_class
        return getattr(
AttributeError: module '__main__' has no attribute 'OwnProgramJob'

如何使我自己的 class 可用于集群上的 pyiron?

我想一种方法是将我自己的 class 直接添加到 pyiron 源代码中并根据 https://github.com/pyiron/pyiron/issues/973#issuecomment-694347111 的建议修改 JOB_CLASS_DICT。有没有不修改pyiron源码的方法?

我的源代码可以在下面找到以供参考。

非常感谢,

弗洛里安


Jupyter 笔记本:

import pyiron
from pathlib import Path

pr = pyiron.Project(path=f"{str(Path.home())}/pyiron/projects/example")

from pyiron_base import GenericJob
import os

class OwnProgramJob(GenericJob):
    def __init__(self, project, job_name):
        super().__init__(project, job_name)
        self.input = OwnProgramInput()
        self.executable = "cat input.in > output.out"
    
    def write_input(self):
        with open(os.path.join(self.working_directory, "input.in"), 'w') as infile:
            infile.write("asd 100")
    
    def collect_output(self):
        file = os.path.join(self.working_directory, "output.out")
        with open(file) as f:
            line = f.readlines()[0]
            energy = float(line.split()[1])
        with self.project_hdf5.open("output/generic") as h5out:
            h5out["energy_tot"] = energy
    
    
class OwnProgramInput(GenericParameters):
    def __init__(self, input_file_name=None):
        super(OwnProgramInput, self).__init__(
            input_file_name=input_file_name,
            table_name="input")
        
    def load_default(self):
        self.load_string("input_energy 100")

job = pr.create_job(job_type=OwnProgramJob, job_name="test", delete_existing_job=True)

job.server.queue = 'cpu'

job.run()

pr.job_table()

SLURM 作业文件:

#SBATCH --workdir={{working_directory}}
#SBATCH --get-user-env=L
#SBATCH --partition=cpu                                                                                                                                                                                                                  
{%- if run_time_max %}
#SBATCH --time={{run_time_max // 60}}
{%- endif %}
{%- if memory_max %}
#SBATCH --mem={{memory_max}}
{%- endif %}
#SBATCH --cpus-per-task={{cores}} 
    
{{command}}

为了在提交到排队系统时作业 class 可用,它必须包含在 python 路径中。所以我建议将 class 定义拆分到一个名为 ownprogramjob.py:

的单独 python 模块中
import os
from pyiron_base import GenericJob, GenericParameters


class OwnProgramJob(GenericJob):
    def __init__(self, project, job_name):
        super().__init__(project, job_name)
        self.input = OwnProgramInput()
        self.executable = "cat input.in > output.out"
    
    def write_input(self):
        with open(os.path.join(self.working_directory, "input.in"), 'w') as infile:
            infile.write("asd 100")
    
    def collect_output(self):
        file = os.path.join(self.working_directory, "output.out")
        with open(file) as f:
            line = f.readlines()[0]
            energy = float(line.split()[1])
        with self.project_hdf5.open("output/generic") as h5out:
            h5out["energy_tot"] = energy
    
    
class OwnProgramInput(GenericParameters):
    def __init__(self, input_file_name=None):
        super(OwnProgramInput, self).__init__(
            input_file_name=input_file_name,
            table_name="input")
        
    def load_default(self):
        self.load_string("input_energy 100")

然后您可以使用以下方式提交:

from pyiron import Project
from ownprogramjob import OwnProgramJob


pr = Project("test")
job = pr.create_job(job_type=OwnProgramJob, job_name="test", delete_existing_job=True)

job.server.queue = 'cpu'

job.run()

pr.job_table()

最佳,

一月