如何编写仅在可能的多个子实例的第一个实例中运行的 snakefile 代码
How to write snakefile code that only runs in the first instance of possible multiple sub-instances
我希望能够在我的 Snakefile 中编写仅在初次调用 Snakefile 时执行的代码,并且如果 snakemake 将 Snakefile 作为子实例重新运行则不会执行,因为我指定了 -j使用多核的选项。我该怎么做?
我说的不是工作流代码,而是 python snakefile 中执行与准备陈述工作流规则相关的各种任务的代码。
我有几个地方想这样做,有些是因为不需要多次这样做,我想通过只在第一次初始调用时这样做来加速 snakefile。例如,我的 snakefile 代码的一部分检查某些管道包含文件(不是实际管道的输入和输出文件)是否已被用户编辑,如果是,则备份它们,我不希望每个子实例扫描所有这些文件的日期,并在必要时进行备份。事实上,存在多个实例试图备份同一个文件的竞争条件。
我找到了方法。
# Create Boolean variable isFirstInstance, True if this is the first snakemake
# instance of a run of snakemake, False if it is nested sub-instance.
#
# This determines whether or not this is the first snakemake instance by creating
# a unique file with each initial run of the snakefile, whose name is created
# much as tempFile() creates files, but we don't use tempFile() because we don't
# want to delete this file when any instance exits, only when the first instance
# exits. The file name includes the process group ID, which will be the same
# for the first instance and for sub-instances. The file contains one line, the
# process ID of its creator. If the file doesn't exist, it is created and we
# set the variable isFirstInstance True to indicate that this is the first
# instance of the pipeline. If the file exists and the process ID it contains
# matches the process ID of one of the parents of the current process, then the
# current process is not the first instance of this pipeline invocation, and
# so we set isFirstInstance False. Two other aberrant situations can arise.
# First, if the file exists and its contained process ID matches the process ID
# of THIS process, we presume that the file was for some reason not deleted from
# a previous run, and that run happened to have a process group ID and process
# ID matching the current one, and so we assume we are first instance, and we
# delete the file and recreate it so its date matches the current date. Second,
# if the file exists and DOES NOT contain the process IDs of one of our parents,
# we make the same presumption of undeleted old file, and again delete the file,
# then rewrite it with our process ID.
################################################################################
# Create file name containing our process group ID in the name.
initialInstancePIDfile = TMP_DIR + "/initialInstancePID." + str(os.getpgrp()) + ".tmp"
# If file doesn't exist, this is first instance. Create the file.
myPID = str(os.getpid())
if not os.path.exists(initialInstancePIDfile):
f = open(initialInstancePIDfile, "wt")
f.write(myPID)
f.close()
isFirstInstance = True
#print("Instance file does not exist, created it:", initialInstancePIDfile, "and myPID =", myPID)
else:
# Otherwise, read the process ID from the file and see if it matches ours.
f = open(initialInstancePIDfile, "rt")
fPID = f.readlines(1)[0]
f.close()
if fPID == myPID:
f = open(initialInstancePIDfile, "wt")
f.write(myPID)
f.close()
isFirstInstance = True
print("Instance file existed already, with our PID: ", myPID, " so we presumed it was a leftover and deleted and recreated it.")
else:
isFirstInstance = None
# It doesn't match ours, does it match one of our parents?
try:
lastPID = None
parentPID = myPID
while parentPID != lastPID:
lastPID = parentPID
parentPID = str(psutil.Process(int(lastPID)).ppid())
#print("Parent ID is:", parentPID)
if parentPID == fPID:
isFirstInstance = False
#print("Instance file contains the PID of one of our parents:", fPID, initialInstancePIDfile, "and myPID =", myPID)
break
except:
pass
# If it doesn't match a parent either, it is a leftover file from a
# previous invocation. Replace it with a new file.
if isFirstInstance is None:
f = open(initialInstancePIDfile, "wt")
f.write(myPID)
f.close()
isFirstInstance = True
print("Instance file existed already, with a PID:", fPID, "not matching ours:", myPID,
"or a parent, so we presumed it was a leftover and deleted and recreated it.")
if isFirstInstance:
print("Initial pipeline instance running.")
else:
print("Pipeline sub-instance running.")
我希望能够在我的 Snakefile 中编写仅在初次调用 Snakefile 时执行的代码,并且如果 snakemake 将 Snakefile 作为子实例重新运行则不会执行,因为我指定了 -j使用多核的选项。我该怎么做?
我说的不是工作流代码,而是 python snakefile 中执行与准备陈述工作流规则相关的各种任务的代码。
我有几个地方想这样做,有些是因为不需要多次这样做,我想通过只在第一次初始调用时这样做来加速 snakefile。例如,我的 snakefile 代码的一部分检查某些管道包含文件(不是实际管道的输入和输出文件)是否已被用户编辑,如果是,则备份它们,我不希望每个子实例扫描所有这些文件的日期,并在必要时进行备份。事实上,存在多个实例试图备份同一个文件的竞争条件。
我找到了方法。
# Create Boolean variable isFirstInstance, True if this is the first snakemake
# instance of a run of snakemake, False if it is nested sub-instance.
#
# This determines whether or not this is the first snakemake instance by creating
# a unique file with each initial run of the snakefile, whose name is created
# much as tempFile() creates files, but we don't use tempFile() because we don't
# want to delete this file when any instance exits, only when the first instance
# exits. The file name includes the process group ID, which will be the same
# for the first instance and for sub-instances. The file contains one line, the
# process ID of its creator. If the file doesn't exist, it is created and we
# set the variable isFirstInstance True to indicate that this is the first
# instance of the pipeline. If the file exists and the process ID it contains
# matches the process ID of one of the parents of the current process, then the
# current process is not the first instance of this pipeline invocation, and
# so we set isFirstInstance False. Two other aberrant situations can arise.
# First, if the file exists and its contained process ID matches the process ID
# of THIS process, we presume that the file was for some reason not deleted from
# a previous run, and that run happened to have a process group ID and process
# ID matching the current one, and so we assume we are first instance, and we
# delete the file and recreate it so its date matches the current date. Second,
# if the file exists and DOES NOT contain the process IDs of one of our parents,
# we make the same presumption of undeleted old file, and again delete the file,
# then rewrite it with our process ID.
################################################################################
# Create file name containing our process group ID in the name.
initialInstancePIDfile = TMP_DIR + "/initialInstancePID." + str(os.getpgrp()) + ".tmp"
# If file doesn't exist, this is first instance. Create the file.
myPID = str(os.getpid())
if not os.path.exists(initialInstancePIDfile):
f = open(initialInstancePIDfile, "wt")
f.write(myPID)
f.close()
isFirstInstance = True
#print("Instance file does not exist, created it:", initialInstancePIDfile, "and myPID =", myPID)
else:
# Otherwise, read the process ID from the file and see if it matches ours.
f = open(initialInstancePIDfile, "rt")
fPID = f.readlines(1)[0]
f.close()
if fPID == myPID:
f = open(initialInstancePIDfile, "wt")
f.write(myPID)
f.close()
isFirstInstance = True
print("Instance file existed already, with our PID: ", myPID, " so we presumed it was a leftover and deleted and recreated it.")
else:
isFirstInstance = None
# It doesn't match ours, does it match one of our parents?
try:
lastPID = None
parentPID = myPID
while parentPID != lastPID:
lastPID = parentPID
parentPID = str(psutil.Process(int(lastPID)).ppid())
#print("Parent ID is:", parentPID)
if parentPID == fPID:
isFirstInstance = False
#print("Instance file contains the PID of one of our parents:", fPID, initialInstancePIDfile, "and myPID =", myPID)
break
except:
pass
# If it doesn't match a parent either, it is a leftover file from a
# previous invocation. Replace it with a new file.
if isFirstInstance is None:
f = open(initialInstancePIDfile, "wt")
f.write(myPID)
f.close()
isFirstInstance = True
print("Instance file existed already, with a PID:", fPID, "not matching ours:", myPID,
"or a parent, so we presumed it was a leftover and deleted and recreated it.")
if isFirstInstance:
print("Initial pipeline instance running.")
else:
print("Pipeline sub-instance running.")