如何使用 Python 并行化这个令人尴尬的并行循环
How to parallelize this embarrassingly parallel loop with Python
我有一个令人尴尬的并行循环:
# Definitions
def exhaustiveExplorationsWithSimilarityAll(inputFolder, outputFolder, similarityMeasure):
phasesSpeedupDictFolder=parsePhasesSpeedupDictFolder(inputFolder)
avgSpeedupProgramDict=computeAvgSpeedupProgram(phasesSpeedupDictFolder)
parameters={
PROGRAMSPHASESSPEEDUPDICTS:phasesSpeedupDictFolder,
PROGRAMSAVGSPEEDUPDICT:avgSpeedupProgramDict
}
similarityHandler= SimilarityHandler(similarityMeasure,parameters)
# Sequential running
for fileName in os.listdir(inputFolder):
print fileName
exhaustiveExplorationsWithSimilarity(inputFolder + fileName, outputFolder + fileName, similarityHandler)
我想使用 Joblib
并行:
# Parallel version
num_cores = multiprocessing.cpu_count()
parallel= Parallel(n_jobs=num_cores)
for fileName in os.listdir(inputFolder):
print fileName
parallel(delayed(exhaustiveExplorationsWithSimilarity(inputFolder + fileName, outputFolder + fileName, similarityHandler)))
或其他版本:
arg_generator = ((inputFolder + fileName, outputFolder + fileName, similarityHandler) for fileName in os.listdir(inputFolder))
parallel(delayed(exhaustiveExplorationsWithSimilarity)(arg_generator))
但是在 运行 之后它抱怨:
parallel(delayed(exhaustiveExplorationsWithSimilarity(inputFolder + fileName, outputFolder + fileName, similarityHandler)))
File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 516, in __call__
for function, args, kwargs in iterable:
TypeError: 'function' object is not iterable
我在这里错过了什么?任何帮助表示赞赏。
您仍在循环中调用 exhaustiveExplorationsWithSimilarity
(串行),但随后您将结果传递给 delayed
根据文档 https://pythonhosted.org/joblib/parallel.html#common-usage,您似乎需要执行以下操作:
parallel = Parallel(n_jobs=num_cores)
parallel(delayed(exhaustiveExplorationsWithSimilarity)(inputFolder + fileName, outputFolder + fileName, similarityHandler) for fileName in os.listdir(inputFolder))
我有一个令人尴尬的并行循环:
# Definitions
def exhaustiveExplorationsWithSimilarityAll(inputFolder, outputFolder, similarityMeasure):
phasesSpeedupDictFolder=parsePhasesSpeedupDictFolder(inputFolder)
avgSpeedupProgramDict=computeAvgSpeedupProgram(phasesSpeedupDictFolder)
parameters={
PROGRAMSPHASESSPEEDUPDICTS:phasesSpeedupDictFolder,
PROGRAMSAVGSPEEDUPDICT:avgSpeedupProgramDict
}
similarityHandler= SimilarityHandler(similarityMeasure,parameters)
# Sequential running
for fileName in os.listdir(inputFolder):
print fileName
exhaustiveExplorationsWithSimilarity(inputFolder + fileName, outputFolder + fileName, similarityHandler)
我想使用 Joblib
并行:
# Parallel version
num_cores = multiprocessing.cpu_count()
parallel= Parallel(n_jobs=num_cores)
for fileName in os.listdir(inputFolder):
print fileName
parallel(delayed(exhaustiveExplorationsWithSimilarity(inputFolder + fileName, outputFolder + fileName, similarityHandler)))
或其他版本:
arg_generator = ((inputFolder + fileName, outputFolder + fileName, similarityHandler) for fileName in os.listdir(inputFolder))
parallel(delayed(exhaustiveExplorationsWithSimilarity)(arg_generator))
但是在 运行 之后它抱怨:
parallel(delayed(exhaustiveExplorationsWithSimilarity(inputFolder + fileName, outputFolder + fileName, similarityHandler)))
File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 516, in __call__
for function, args, kwargs in iterable:
TypeError: 'function' object is not iterable
我在这里错过了什么?任何帮助表示赞赏。
您仍在循环中调用 exhaustiveExplorationsWithSimilarity
(串行),但随后您将结果传递给 delayed
根据文档 https://pythonhosted.org/joblib/parallel.html#common-usage,您似乎需要执行以下操作:
parallel = Parallel(n_jobs=num_cores)
parallel(delayed(exhaustiveExplorationsWithSimilarity)(inputFolder + fileName, outputFolder + fileName, similarityHandler) for fileName in os.listdir(inputFolder))