并行for的执行时间

Question

我很好奇用单线程或多线程（使用 openmp）执行 for 所花费的时间，所以我写了这段代码来查看区别：

#define N 1000000000 // 10^9

int main(int argc, char* argv[])
{
    int i, *a = malloc(N * sizeof *a);
    clock_t begin, end;
    double time_spent;

    begin = clock();

    #pragma omp parallel for
    for(i=0;i<N;i++)
        a[i] = i;

    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("Time Spent: %lf\n", time_spent);
    free(a);
    return 0;
}

但随后发生了一件奇怪的事情：使用#pragma规则，执行时间约为4.6s，但没有它，约为3.6s。怎么可能？难道我做错了什么？或者我没有使用正确的计时功能？

Answer 1

clock() return CLOCK_PROCESS_CPUTIME_ID 时钟的值。此时钟在 clock_gettime(3):

中描述

CLOCK_PROCESS_CPUTIME_ID (since Linux 2.6.12)
          Per-process CPU-time clock (measures CPU time consumed by all
          threads in the process).

使用 clock_gettime() 和 CLOCK_MONOTONIC 以获得正确的度量。

并行for的执行时间

Execution time with parallel for

c

multithreading

openmp

execution-time