并行计算开销
Parallel Computation Overhead
我正在使用以下代码作为 CompilerPhase
class 的一部分。该方法由编译器的主要方法调用(并进行基准测试)。
并行编译阶段:
private Consumer<ICompilationUnit> apply;
// ...
@Override
public void apply(Collection<ICompilationUnit> units)
{
this.count = units.size();
for (ICompilationUnit unit : units)
{
new Thread()
{
@Override
public void run()
{
ParallelCompilerPhase.this.apply.accept(unit);
ParallelCompilerPhase.this.count--;
}
}.start();
}
long now = System.currentTimeMillis();
while (this.count > 0)
{
long l = System.currentTimeMillis() - now;
if (l >= 1000L)
{
DyvilCompiler.logger.warning(this.name + " is taking too long! " + l + " ms");
try
{
Thread.sleep(1000L);
}
catch (InterruptedException ex)
{
ex.printStackTrace();
}
}
}
}
编译阶段:
private Consumer<Collection<ICompilationUnit>> apply;
//...
@Override
public void apply(Collection<ICompilationUnit> units)
{
this.apply.accept(units);
}
使用旧实现 (CompilerPhase
),整个过程(11 个不同阶段)花费 40-60 毫秒来编译 1 个编译单元。但是,新的实现 (ParallelCompilerPhase
) 为此增加了 2000 毫秒的开销。阶段 TOKENIZE
、PARSE
、RESOLVE_TYPES
、RESOLVE
、CHECK
、PRINT
和 COMPILE
使用 ParallelCompilerPhase
。
这是编译器的输出:
[2015-03-04 23:16:49] [INFO]: Loaded 2 Libraries (235.7 ms, 117.9 ms/L, 8.48 L/s)
[2015-03-04 23:16:49] [INFO]: Compiling 'src/test' to 'dbin'
[2015-03-04 23:16:49] [INFO]: Applying 8 States: [TOKENIZE, PARSE, RESOLVE_TYPES, RESOLVE, CHECK, PRINT, COMPILE, TEST]
[2015-03-04 23:16:49] [INFO]: Compiling 2 Packages, 2 Files (1 Compilation Unit)
[2015-03-04 23:16:49] [INFO]: Applying State TOKENIZE
[2015-03-04 23:16:49] [INFO]: Finished State TOKENIZE (2.4 ms, 2.4 ms/CU, 423.19 CU/s)
[2015-03-04 23:16:49] [INFO]: Applying State PARSE
[2015-03-04 23:16:50] [WARNING]: PARSE is taking too long! 1000 ms
[2015-03-04 23:16:51] [INFO]: Finished State PARSE (2005.1 ms, 2005.1 ms/CU, 0.50 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State RESOLVE_TYPES
[2015-03-04 23:16:51] [INFO]: Finished State RESOLVE_TYPES (17.1 ms, 17.1 ms/CU, 58.35 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State RESOLVE
[2015-03-04 23:16:51] [INFO]: Finished State RESOLVE (24.0 ms, 24.0 ms/CU, 41.70 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State CHECK
[2015-03-04 23:16:51] [INFO]: Finished State CHECK (0.5 ms, 0.5 ms/CU, 1838.24 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State PRINT
[2015-03-04 23:16:51] [INFO]: src/test/dyvil/test/Main.dyvil:
// ...
[2015-03-04 23:16:51] [INFO]: Finished State PRINT (42.3 ms, 42.3 ms/CU, 23.61 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State COMPILE
[2015-03-04 23:16:51] [INFO]: Finished State COMPILE (5.2 ms, 5.2 ms/CU, 192.64 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State TEST
[2015-03-04 23:16:51] [INFO]: Finished State TEST (46.0 ms, 46.0 ms/CU, 21.72 CU/s)
[2015-03-04 23:16:51] [INFO]: Compilation finished (2148.6 ms, 2148.6 ms/CU, 0.47 CU/s)
// ...
[2015-03-04 23:16:51] [INFO]: Test completed without Errors (1 ms)
但是,如果我将 ParallelCompilerPhase
的实现更改为:
@Override
public void apply(Collection<ICompilationUnit> units)
{
for (ICompilationUnit unit : units)
{
this.apply.accept(unit);
}
}
编译器的输出如下所示:
[2015-03-04 23:21:36] [INFO]: Dyvil Compiler 1.0.0 for Dyvil 1.0.0
[2015-03-04 23:21:36] [INFO]: Loaded 2 Libraries (245.6 ms, 122.8 ms/L, 8.14 L/s)
[2015-03-04 23:21:36] [INFO]: Compiling 'src/test' to 'dbin'
[2015-03-04 23:21:36] [INFO]: Applying 8 States: [TOKENIZE, PARSE, RESOLVE_TYPES, RESOLVE, CHECK, PRINT, COMPILE, TEST]
[2015-03-04 23:21:36] [INFO]: Compiling 2 Packages, 2 Files (1 Compilation Unit)
[2015-03-04 23:21:36] [INFO]: Applying State TOKENIZE
[2015-03-04 23:21:36] [INFO]: Finished State TOKENIZE (0.6 ms, 0.6 ms/CU, 1721.17 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State PARSE
[2015-03-04 23:21:36] [INFO]: Finished State PARSE (20.6 ms, 20.6 ms/CU, 48.59 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State RESOLVE_TYPES
[2015-03-04 23:21:36] [INFO]: Finished State RESOLVE_TYPES (8.5 ms, 8.5 ms/CU, 117.34 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State RESOLVE
[2015-03-04 23:21:36] [INFO]: Finished State RESOLVE (15.9 ms, 15.9 ms/CU, 63.07 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State CHECK
[2015-03-04 23:21:36] [INFO]: Finished State CHECK (0.2 ms, 0.2 ms/CU, 4587.16 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State PRINT
[2015-03-04 23:21:36] [INFO]: src/test/dyvil/test/Main.dyvil:
// ...
[2015-03-04 23:21:36] [INFO]: Finished State PRINT (2.1 ms, 2.1 ms/CU, 479.39 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State COMPILE
[2015-03-04 23:21:36] [INFO]: Finished State COMPILE (4.0 ms, 4.0 ms/CU, 251.76 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State TEST
[2015-03-04 23:21:36] [INFO]: Finished State TEST (0.6 ms, 0.6 ms/CU, 1686.34 CU/s)
[2015-03-04 23:21:36] [INFO]: Compilation finished (57.5 ms, 57.5 ms/CU, 17.40 CU/s)
// ...
[2015-03-04 23:21:36] [INFO]: Test completed without Errors (2 ms)
是什么导致了这 2000 毫秒的开销?
作为一个可能的修复,将 ParallelCompilerPhase
的实现替换为
units.parallelStream().forEach(this.apply);
用 Thread 方法做我最初想做的事情?
您正在为每个工作单元启动一个新线程。这是一个糟糕的想法。对于任何受计算约束的任务(不会花费大部分时间阻塞等待 IO 的任务),没有任何理由拥有比 CPU 内核更多的线程。超过这个阈值只会浪费时间在线程之间进行上下文切换(并浪费大量时间启动和拆除线程,这并不便宜)。无论有多少线程在运行,您的处理器实际上不能同时处理比它拥有的处理器资源更多的事情。
相反,您应该考虑使用 Executor to manage a thread pool,让工作线程从队列中弹出工作单元并执行它们。
在 ParallelCompilerPhase 中,会发生以下情况
- 新线程已创建
- 主线程一直忙于检查
this.count
和时间,所以新线程不会 运行
- 1000ms后打印消息,主线程休眠1000ms
- 其他线程执行并完成
- 主线程唤醒,2005.1ms过去了。
问题是繁忙的循环。
尝试:
@Override
public void apply(Collection<ICompilationUnit> units)
{
this.count = units.size();
for (ICompilationUnit unit : units)
{
new Thread()
{
@Override
public void run()
{
ParallelCompilerPhase.this.apply.accept(unit);
ParallelCompilerPhase.this.count--;
}
}.start();
}
long now = System.currentTimeMillis();
while (this.count > 0)
{
long l = System.currentTimeMillis() - now;
if (l >= 1000L)
{
DyvilCompiler.logger.warning(this.name + " is taking too long! " + l + " ms");
try
{
Thread.sleep(1000L);
}
catch (InterruptedException ex)
{
ex.printStackTrace();
}
}
try
{
Thread.sleep(10L);
}
catch (InterruptedException ex)
{
ex.printStackTrace();
}
}
}
但是,等待线程的最佳方法是使用 Thread.join()
,正如@rici 所建议的,因为这将是 'official way' 来执行此操作,并且不会导致任何浪费处理器时间。使用上述解决方案,主线程在 worker 完成后等待最多 10ms 的额外时间,join()
主线程会在 worker 完成后立即唤醒。
我正在使用以下代码作为 CompilerPhase
class 的一部分。该方法由编译器的主要方法调用(并进行基准测试)。
并行编译阶段:
private Consumer<ICompilationUnit> apply;
// ...
@Override
public void apply(Collection<ICompilationUnit> units)
{
this.count = units.size();
for (ICompilationUnit unit : units)
{
new Thread()
{
@Override
public void run()
{
ParallelCompilerPhase.this.apply.accept(unit);
ParallelCompilerPhase.this.count--;
}
}.start();
}
long now = System.currentTimeMillis();
while (this.count > 0)
{
long l = System.currentTimeMillis() - now;
if (l >= 1000L)
{
DyvilCompiler.logger.warning(this.name + " is taking too long! " + l + " ms");
try
{
Thread.sleep(1000L);
}
catch (InterruptedException ex)
{
ex.printStackTrace();
}
}
}
}
编译阶段:
private Consumer<Collection<ICompilationUnit>> apply;
//...
@Override
public void apply(Collection<ICompilationUnit> units)
{
this.apply.accept(units);
}
使用旧实现 (CompilerPhase
),整个过程(11 个不同阶段)花费 40-60 毫秒来编译 1 个编译单元。但是,新的实现 (ParallelCompilerPhase
) 为此增加了 2000 毫秒的开销。阶段 TOKENIZE
、PARSE
、RESOLVE_TYPES
、RESOLVE
、CHECK
、PRINT
和 COMPILE
使用 ParallelCompilerPhase
。
这是编译器的输出:
[2015-03-04 23:16:49] [INFO]: Loaded 2 Libraries (235.7 ms, 117.9 ms/L, 8.48 L/s)
[2015-03-04 23:16:49] [INFO]: Compiling 'src/test' to 'dbin'
[2015-03-04 23:16:49] [INFO]: Applying 8 States: [TOKENIZE, PARSE, RESOLVE_TYPES, RESOLVE, CHECK, PRINT, COMPILE, TEST]
[2015-03-04 23:16:49] [INFO]: Compiling 2 Packages, 2 Files (1 Compilation Unit)
[2015-03-04 23:16:49] [INFO]: Applying State TOKENIZE
[2015-03-04 23:16:49] [INFO]: Finished State TOKENIZE (2.4 ms, 2.4 ms/CU, 423.19 CU/s)
[2015-03-04 23:16:49] [INFO]: Applying State PARSE
[2015-03-04 23:16:50] [WARNING]: PARSE is taking too long! 1000 ms
[2015-03-04 23:16:51] [INFO]: Finished State PARSE (2005.1 ms, 2005.1 ms/CU, 0.50 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State RESOLVE_TYPES
[2015-03-04 23:16:51] [INFO]: Finished State RESOLVE_TYPES (17.1 ms, 17.1 ms/CU, 58.35 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State RESOLVE
[2015-03-04 23:16:51] [INFO]: Finished State RESOLVE (24.0 ms, 24.0 ms/CU, 41.70 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State CHECK
[2015-03-04 23:16:51] [INFO]: Finished State CHECK (0.5 ms, 0.5 ms/CU, 1838.24 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State PRINT
[2015-03-04 23:16:51] [INFO]: src/test/dyvil/test/Main.dyvil:
// ...
[2015-03-04 23:16:51] [INFO]: Finished State PRINT (42.3 ms, 42.3 ms/CU, 23.61 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State COMPILE
[2015-03-04 23:16:51] [INFO]: Finished State COMPILE (5.2 ms, 5.2 ms/CU, 192.64 CU/s)
[2015-03-04 23:16:51] [INFO]: Applying State TEST
[2015-03-04 23:16:51] [INFO]: Finished State TEST (46.0 ms, 46.0 ms/CU, 21.72 CU/s)
[2015-03-04 23:16:51] [INFO]: Compilation finished (2148.6 ms, 2148.6 ms/CU, 0.47 CU/s)
// ...
[2015-03-04 23:16:51] [INFO]: Test completed without Errors (1 ms)
但是,如果我将 ParallelCompilerPhase
的实现更改为:
@Override
public void apply(Collection<ICompilationUnit> units)
{
for (ICompilationUnit unit : units)
{
this.apply.accept(unit);
}
}
编译器的输出如下所示:
[2015-03-04 23:21:36] [INFO]: Dyvil Compiler 1.0.0 for Dyvil 1.0.0
[2015-03-04 23:21:36] [INFO]: Loaded 2 Libraries (245.6 ms, 122.8 ms/L, 8.14 L/s)
[2015-03-04 23:21:36] [INFO]: Compiling 'src/test' to 'dbin'
[2015-03-04 23:21:36] [INFO]: Applying 8 States: [TOKENIZE, PARSE, RESOLVE_TYPES, RESOLVE, CHECK, PRINT, COMPILE, TEST]
[2015-03-04 23:21:36] [INFO]: Compiling 2 Packages, 2 Files (1 Compilation Unit)
[2015-03-04 23:21:36] [INFO]: Applying State TOKENIZE
[2015-03-04 23:21:36] [INFO]: Finished State TOKENIZE (0.6 ms, 0.6 ms/CU, 1721.17 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State PARSE
[2015-03-04 23:21:36] [INFO]: Finished State PARSE (20.6 ms, 20.6 ms/CU, 48.59 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State RESOLVE_TYPES
[2015-03-04 23:21:36] [INFO]: Finished State RESOLVE_TYPES (8.5 ms, 8.5 ms/CU, 117.34 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State RESOLVE
[2015-03-04 23:21:36] [INFO]: Finished State RESOLVE (15.9 ms, 15.9 ms/CU, 63.07 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State CHECK
[2015-03-04 23:21:36] [INFO]: Finished State CHECK (0.2 ms, 0.2 ms/CU, 4587.16 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State PRINT
[2015-03-04 23:21:36] [INFO]: src/test/dyvil/test/Main.dyvil:
// ...
[2015-03-04 23:21:36] [INFO]: Finished State PRINT (2.1 ms, 2.1 ms/CU, 479.39 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State COMPILE
[2015-03-04 23:21:36] [INFO]: Finished State COMPILE (4.0 ms, 4.0 ms/CU, 251.76 CU/s)
[2015-03-04 23:21:36] [INFO]: Applying State TEST
[2015-03-04 23:21:36] [INFO]: Finished State TEST (0.6 ms, 0.6 ms/CU, 1686.34 CU/s)
[2015-03-04 23:21:36] [INFO]: Compilation finished (57.5 ms, 57.5 ms/CU, 17.40 CU/s)
// ...
[2015-03-04 23:21:36] [INFO]: Test completed without Errors (2 ms)
是什么导致了这 2000 毫秒的开销?
作为一个可能的修复,将 ParallelCompilerPhase
的实现替换为
units.parallelStream().forEach(this.apply);
用 Thread 方法做我最初想做的事情?
您正在为每个工作单元启动一个新线程。这是一个糟糕的想法。对于任何受计算约束的任务(不会花费大部分时间阻塞等待 IO 的任务),没有任何理由拥有比 CPU 内核更多的线程。超过这个阈值只会浪费时间在线程之间进行上下文切换(并浪费大量时间启动和拆除线程,这并不便宜)。无论有多少线程在运行,您的处理器实际上不能同时处理比它拥有的处理器资源更多的事情。
相反,您应该考虑使用 Executor to manage a thread pool,让工作线程从队列中弹出工作单元并执行它们。
在 ParallelCompilerPhase 中,会发生以下情况
- 新线程已创建
- 主线程一直忙于检查
this.count
和时间,所以新线程不会 运行 - 1000ms后打印消息,主线程休眠1000ms
- 其他线程执行并完成
- 主线程唤醒,2005.1ms过去了。
问题是繁忙的循环。
尝试:
@Override
public void apply(Collection<ICompilationUnit> units)
{
this.count = units.size();
for (ICompilationUnit unit : units)
{
new Thread()
{
@Override
public void run()
{
ParallelCompilerPhase.this.apply.accept(unit);
ParallelCompilerPhase.this.count--;
}
}.start();
}
long now = System.currentTimeMillis();
while (this.count > 0)
{
long l = System.currentTimeMillis() - now;
if (l >= 1000L)
{
DyvilCompiler.logger.warning(this.name + " is taking too long! " + l + " ms");
try
{
Thread.sleep(1000L);
}
catch (InterruptedException ex)
{
ex.printStackTrace();
}
}
try
{
Thread.sleep(10L);
}
catch (InterruptedException ex)
{
ex.printStackTrace();
}
}
}
但是,等待线程的最佳方法是使用 Thread.join()
,正如@rici 所建议的,因为这将是 'official way' 来执行此操作,并且不会导致任何浪费处理器时间。使用上述解决方案,主线程在 worker 完成后等待最多 10ms 的额外时间,join()
主线程会在 worker 完成后立即唤醒。