如何编写仅在可能的多个子实例的第一个实例中运行的 snakefile 代码

Question

我希望能够在我的 Snakefile 中编写仅在初次调用 Snakefile 时执行的代码，并且如果 snakemake 将 Snakefile 作为子实例重新运行则不会执行，因为我指定了 -j使用多核的选项。我该怎么做？

我说的不是工作流代码，而是 python snakefile 中执行与准备陈述工作流规则相关的各种任务的代码。

我有几个地方想这样做，有些是因为不需要多次这样做，我想通过只在第一次初始调用时这样做来加速 snakefile。例如，我的 snakefile 代码的一部分检查某些管道包含文件（不是实际管道的输入和输出文件）是否已被用户编辑，如果是，则备份它们，我不希望每个子实例扫描所有这些文件的日期，并在必要时进行备份。事实上，存在多个实例试图备份同一个文件的竞争条件。

Answer 1

我找到了方法。

# Create Boolean variable isFirstInstance, True if this is the first snakemake
# instance of a run of snakemake, False if it is nested sub-instance.
#
# This determines whether or not this is the first snakemake instance by creating
# a unique file with each initial run of the snakefile, whose name is created
# much as tempFile() creates files, but we don't use tempFile() because we don't
# want to delete this file when any instance exits, only when the first instance
# exits.  The file name includes the process group ID, which will be the same
# for the first instance and for sub-instances.  The file contains one line, the
# process ID of its creator.  If the file doesn't exist, it is created and we
# set the variable isFirstInstance True to indicate that this is the first
# instance of the pipeline.  If the file exists and the process ID it contains
# matches the process ID of one of the parents of the current process, then the
# current process is not the first instance of this pipeline invocation, and
# so we set isFirstInstance False.  Two other aberrant situations can arise.
# First, if the file exists and its contained process ID matches the process ID
# of THIS process, we presume that the file was for some reason not deleted from
# a previous run, and that run happened to have a process group ID and process
# ID matching the current one, and so we assume we are first instance, and we
# delete the file and recreate it so its date matches the current date.  Second,
# if the file exists and DOES NOT contain the process IDs of one of our parents,
# we make the same presumption of undeleted old file, and again delete the file,
# then rewrite it with our process ID.
################################################################################

# Create file name containing our process group ID in the name.
initialInstancePIDfile = TMP_DIR + "/initialInstancePID." + str(os.getpgrp()) + ".tmp"

# If file doesn't exist, this is first instance.  Create the file.
myPID = str(os.getpid())
if not os.path.exists(initialInstancePIDfile):
    f = open(initialInstancePIDfile, "wt")
    f.write(myPID)
    f.close()
    isFirstInstance = True
    #print("Instance file does not exist, created it:", initialInstancePIDfile, "and myPID =", myPID)
else:
    # Otherwise, read the process ID from the file and see if it matches ours.
    f = open(initialInstancePIDfile, "rt")
    fPID = f.readlines(1)[0]
    f.close()
    if fPID == myPID:
        f = open(initialInstancePIDfile, "wt")
        f.write(myPID)
        f.close()
        isFirstInstance = True
        print("Instance file existed already, with our PID: ", myPID, " so we presumed it was a leftover and deleted and recreated it.")
    else:
        isFirstInstance = None
        # It doesn't match ours, does it match one of our parents?
        try:
            lastPID = None
            parentPID = myPID
            while parentPID != lastPID:
                lastPID = parentPID
                parentPID = str(psutil.Process(int(lastPID)).ppid())
                #print("Parent ID is:", parentPID)
                if parentPID == fPID:
                    isFirstInstance = False
                    #print("Instance file contains the PID of one of our parents:", fPID, initialInstancePIDfile, "and myPID =", myPID)
                    break
        except:
            pass
        # If it doesn't match a parent either, it is a leftover file from a
        # previous invocation.  Replace it with a new file.
        if isFirstInstance is None:
            f = open(initialInstancePIDfile, "wt")
            f.write(myPID)
            f.close()
            isFirstInstance = True
            print("Instance file existed already, with a PID:", fPID, "not matching ours:", myPID,
                "or a parent, so we presumed it was a leftover and deleted and recreated it.")
if isFirstInstance:
    print("Initial pipeline instance running.")
else:
    print("Pipeline sub-instance running.")

如何编写仅在可能的多个子实例的第一个实例中运行的 snakefile 代码

How to write snakefile code that only runs in the first instance of possible multiple sub-instances

multiple-instances

snakemake