如何并行化这个嵌套循环

How to paralllelize this nested loop

我将 joblib 与 Dask 结合使用来并行化具有以下循环结构的代码:

def main():
    for semtype in semtypes:
        test = get_valid_systems(systems, semtype)
        expressions = get_ensemble_pairs(test)
    
        for c in expressions:

            <do stuff>

第一次尝试用内循环重写为:

if __name__ == '__main__':

    for semtype in semtypes:
        test = get_valid_systems(systems, semtype)
        expressions = get_ensemble_pairs(test)

        print('SYSTEMS FOR SEMTYPE', semtype, 'ARE', test)
    
        with joblib.parallel_backend('dask'):
            joblib.Parallel(verbose=10)(joblib.delayed(main)(c) for c in expressions)

效果很好。

现在,我想添加两个循环,如:

with joblib.parallel_backend('dask'):

    joblib.Parallel(verbose=100)(joblib.delayed(main)(semtype, c) for c in get_ensemble_pairs(get_valid_systems(systems, semtype)) for semtype in semtypes)

但是,我收到 name 'semtype' is not defined 的错误。我假设这是我的 Paraallel 语句中函数调用的范围问题。我不太确定如何处理这个问题?

最外层的循环应该在第一位。

with joblib.parallel_backend('dask'):

    joblib.Parallel(verbose=100)(joblib.delayed(main)(semtype, c) for semtype in semtypes for c in get_ensemble_pairs(get_valid_systems(systems, semtype)))