变量累积的循环向量化
Vectorization of loop with accumulation in variable
我有以下正在使用 icc 编译的循环
for (int i = 0; i < arrays_size; ++i) {
total = total + C[i];
}
矢量化报告说这个循环已经被矢量化了,但我不明白这是怎么可能的,因为有明显的先读后写依赖。
报告输出如下:
LOOP BEGIN at loops.cpp(46,5)
remark #15388: vectorization support: reference C has aligned access [ loops.cpp(47,7) ]
remark #15305: vectorization support: vector length 4
remark #15399: vectorization support: unroll factor set to 8
remark #15309: vectorization support: normalized vectorization overhead 0.475
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 1
remark #15475: --- begin vector loop cost summary ---
remark #15476: scalar loop cost: 5
remark #15477: vector loop cost: 1.250
remark #15478: estimated potential speedup: 3.990
remark #15488: --- end vector loop cost summary ---
remark #25015: Estimate of max trip count of loop=31250
LOOP END
谁能解释一下这意味着什么以及如何向量化这个循环?
根据 total
和 C[i]
的类型,您可以利用加法的结合律和交换律以及第 4 或 8(或更多)个小计的总和。
int subtotal[4] = {0,0,0,0};
for (int i = 0; i < arrays_size; i+=4) {
for(int k=0; k<4; ++k)
subtotal[k] += C[i+k];
}
// handle remaining elements of C, if necessary ...
// sum-up sub-totals:
total = (subtotal[0]+subtotal[2]) + (subtotal[1]+subtotal[3]);
这适用于任何整数类型,但默认情况下 ICC 假定浮点加法也是关联的(gcc 和 clang 为此需要 -ffast-math
的某些子集)。
我有以下正在使用 icc 编译的循环
for (int i = 0; i < arrays_size; ++i) {
total = total + C[i];
}
矢量化报告说这个循环已经被矢量化了,但我不明白这是怎么可能的,因为有明显的先读后写依赖。
报告输出如下:
LOOP BEGIN at loops.cpp(46,5)
remark #15388: vectorization support: reference C has aligned access [ loops.cpp(47,7) ]
remark #15305: vectorization support: vector length 4
remark #15399: vectorization support: unroll factor set to 8
remark #15309: vectorization support: normalized vectorization overhead 0.475
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 1
remark #15475: --- begin vector loop cost summary ---
remark #15476: scalar loop cost: 5
remark #15477: vector loop cost: 1.250
remark #15478: estimated potential speedup: 3.990
remark #15488: --- end vector loop cost summary ---
remark #25015: Estimate of max trip count of loop=31250
LOOP END
谁能解释一下这意味着什么以及如何向量化这个循环?
根据 total
和 C[i]
的类型,您可以利用加法的结合律和交换律以及第 4 或 8(或更多)个小计的总和。
int subtotal[4] = {0,0,0,0};
for (int i = 0; i < arrays_size; i+=4) {
for(int k=0; k<4; ++k)
subtotal[k] += C[i+k];
}
// handle remaining elements of C, if necessary ...
// sum-up sub-totals:
total = (subtotal[0]+subtotal[2]) + (subtotal[1]+subtotal[3]);
这适用于任何整数类型,但默认情况下 ICC 假定浮点加法也是关联的(gcc 和 clang 为此需要 -ffast-math
的某些子集)。