并行计算开销

Parallel Computation Overhead

我正在使用以下代码作为 CompilerPhase class 的一部分。该方法由编译器的主要方法调用(并进行基准测试)。

并行编译阶段:

private Consumer<ICompilationUnit> apply;
// ...

@Override
public void apply(Collection<ICompilationUnit> units)
{
    this.count = units.size();
    for (ICompilationUnit unit : units)
    {
        new Thread()
        {
            @Override
            public void run()
            {
                ParallelCompilerPhase.this.apply.accept(unit);
                ParallelCompilerPhase.this.count--;
            }
        }.start();
    }

    long now = System.currentTimeMillis();
    while (this.count > 0)
    {
        long l = System.currentTimeMillis() - now;
        if (l >= 1000L)
        {
            DyvilCompiler.logger.warning(this.name + " is taking too long! " + l + " ms");
            try
            {
                Thread.sleep(1000L);
            }
            catch (InterruptedException ex)
            {
                ex.printStackTrace();
            }
        }
    }
}

编译阶段:

private Consumer<Collection<ICompilationUnit>> apply;
//...

@Override
public void apply(Collection<ICompilationUnit> units)
{
    this.apply.accept(units);
}

使用旧实现 (CompilerPhase),整个过程(11 个不同阶段)花费 40-60 毫秒来编译 1 个编译单元。但是,新的实现 (ParallelCompilerPhase) 为此增加了 2000 毫秒的开销。阶段 TOKENIZEPARSERESOLVE_TYPESRESOLVECHECKPRINTCOMPILE 使用 ParallelCompilerPhase

这是编译器的输出:

[2015-03-04 23:16:49] [INFO]: Loaded 2 Libraries (235.7 ms, 117.9 ms/L, 8.48 L/s)
[2015-03-04 23:16:49] [INFO]: Compiling 'src/test' to 'dbin'
[2015-03-04 23:16:49] [INFO]: Applying 8 States: [TOKENIZE, PARSE, RESOLVE_TYPES, RESOLVE, CHECK, PRINT, COMPILE, TEST]
[2015-03-04 23:16:49] [INFO]: Compiling 2 Packages, 2 Files (1 Compilation Unit)

[2015-03-04 23:16:49] [INFO]: Applying State TOKENIZE
[2015-03-04 23:16:49] [INFO]: Finished State TOKENIZE (2.4 ms, 2.4 ms/CU, 423.19 CU/s)
[2015-03-04 23:16:49] [INFO]: Applying State PARSE
[2015-03-04 23:16:50] [WARNING]: PARSE is taking too long! 1000 ms
[2015-03-04 23:16:51] [INFO]: Finished State PARSE (2005.1 ms, 2005.1 ms/CU, 0.50 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State RESOLVE_TYPES
[2015-03-04 23:16:51] [INFO]: Finished State RESOLVE_TYPES (17.1 ms, 17.1 ms/CU, 58.35 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State RESOLVE
[2015-03-04 23:16:51] [INFO]: Finished State RESOLVE (24.0 ms, 24.0 ms/CU, 41.70 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State CHECK
[2015-03-04 23:16:51] [INFO]: Finished State CHECK (0.5 ms, 0.5 ms/CU, 1838.24 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State PRINT
[2015-03-04 23:16:51] [INFO]: src/test/dyvil/test/Main.dyvil:
// ...
[2015-03-04 23:16:51] [INFO]: Finished State PRINT (42.3 ms, 42.3 ms/CU, 23.61 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State COMPILE
[2015-03-04 23:16:51] [INFO]: Finished State COMPILE (5.2 ms, 5.2 ms/CU, 192.64 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State TEST
[2015-03-04 23:16:51] [INFO]: Finished State TEST (46.0 ms, 46.0 ms/CU, 21.72 CU/s)

[2015-03-04 23:16:51] [INFO]: Compilation finished (2148.6 ms, 2148.6 ms/CU, 0.47 CU/s)
// ...
[2015-03-04 23:16:51] [INFO]: Test completed without Errors (1 ms)

但是,如果我将 ParallelCompilerPhase 的实现更改为:

@Override
public void apply(Collection<ICompilationUnit> units)
{
    for (ICompilationUnit unit : units)
    {
        this.apply.accept(unit);
    }
}

编译器的输出如下所示:

[2015-03-04 23:21:36] [INFO]: Dyvil Compiler 1.0.0 for Dyvil 1.0.0

[2015-03-04 23:21:36] [INFO]: Loaded 2 Libraries (245.6 ms, 122.8 ms/L, 8.14 L/s)
[2015-03-04 23:21:36] [INFO]: Compiling 'src/test' to 'dbin'
[2015-03-04 23:21:36] [INFO]: Applying 8 States: [TOKENIZE, PARSE, RESOLVE_TYPES, RESOLVE, CHECK, PRINT, COMPILE, TEST]
[2015-03-04 23:21:36] [INFO]: Compiling 2 Packages, 2 Files (1 Compilation Unit)

[2015-03-04 23:21:36] [INFO]: Applying State TOKENIZE
[2015-03-04 23:21:36] [INFO]: Finished State TOKENIZE (0.6 ms, 0.6 ms/CU, 1721.17 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State PARSE
[2015-03-04 23:21:36] [INFO]: Finished State PARSE (20.6 ms, 20.6 ms/CU, 48.59 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State RESOLVE_TYPES
[2015-03-04 23:21:36] [INFO]: Finished State RESOLVE_TYPES (8.5 ms, 8.5 ms/CU, 117.34 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State RESOLVE
[2015-03-04 23:21:36] [INFO]: Finished State RESOLVE (15.9 ms, 15.9 ms/CU, 63.07 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State CHECK
[2015-03-04 23:21:36] [INFO]: Finished State CHECK (0.2 ms, 0.2 ms/CU, 4587.16 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State PRINT
[2015-03-04 23:21:36] [INFO]: src/test/dyvil/test/Main.dyvil:
// ...
[2015-03-04 23:21:36] [INFO]: Finished State PRINT (2.1 ms, 2.1 ms/CU, 479.39 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State COMPILE
[2015-03-04 23:21:36] [INFO]: Finished State COMPILE (4.0 ms, 4.0 ms/CU, 251.76 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State TEST
[2015-03-04 23:21:36] [INFO]: Finished State TEST (0.6 ms, 0.6 ms/CU, 1686.34 CU/s)

[2015-03-04 23:21:36] [INFO]: Compilation finished (57.5 ms, 57.5 ms/CU, 17.40 CU/s)
// ...
[2015-03-04 23:21:36] [INFO]: Test completed without Errors (2 ms)

是什么导致了这 2000 毫秒的开销?


作为一个可能的修复,将 ParallelCompilerPhase 的实现替换为

units.parallelStream().forEach(this.apply);

用 Thread 方法做我最初想做的事情?

您正在为每个工作单元启动一个新线程。这是一个糟糕的想法。对于任何受计算约束的任务(不会花费大部分时间阻塞等待 IO 的任务),没有任何理由拥有比 CPU 内核更多的线程。超过这个阈值只会浪费时间在线程之间进行上下文切换(并浪费大量时间启动和拆除线程,这并不便宜)。无论有多少线程在运行,您的处理器实际上不能同时处理比它拥有的处理器资源更多的事情。

相反,您应该考虑使用 Executor to manage a thread pool,让工作线程从队列中弹出工作单元并执行它们。

在 ParallelCompilerPhase 中,会发生以下情况

  • 新线程已创建
  • 主线程一直忙于检查 this.count 和时间,所以新线程不会 运行
  • 1000ms后打印消息,主线程休眠1000ms
  • 其他线程执行并完成
  • 主线程唤醒,2005.1ms过去了。

问题是繁忙的循环。
尝试:

@Override
public void apply(Collection<ICompilationUnit> units)
{
    this.count = units.size();
    for (ICompilationUnit unit : units)
    {
        new Thread()
        {
            @Override
            public void run()
            {
                ParallelCompilerPhase.this.apply.accept(unit);
                ParallelCompilerPhase.this.count--;
            }
        }.start();
    }

    long now = System.currentTimeMillis();
    while (this.count > 0)
    {
        long l = System.currentTimeMillis() - now;
        if (l >= 1000L)
        {
            DyvilCompiler.logger.warning(this.name + " is taking too long! " + l + " ms");
            try
            {
                Thread.sleep(1000L);
            }
            catch (InterruptedException ex)
            {
                ex.printStackTrace();
            }
        }
        try
        {
            Thread.sleep(10L);
        }
        catch (InterruptedException ex)
        {
            ex.printStackTrace();
        }
    }
}

但是,等待线程的最佳方法是使用 Thread.join(),正如@rici 所建议的,因为这将是 'official way' 来执行此操作,并且不会导致任何浪费处理器时间。使用上述解决方案,主线程在 worker 完成后等待最多 10ms 的额外时间,join()主线程会在 worker 完成后立即唤醒。