OpenCL 与 printf 的竞争条件?

OpenCL race condition with printf?

我目前正在尝试测试是否可以让一些基本操作(读取和写入内存)在 OpenCL 内核(英特尔 SDK)中工作。这是代码的一部分——省略了一些未使用的参数:

__kernel
void myfunc(__global char *buf_pw,
__global char *buf_hash)
{
    int idx = get_global_id(0); 
    int a = 1 + 1;   
    char wololol[8] = "wololol"; 
    if (idx == 0 )
    {
        buf_pw[0] = 'A';
        buf_pw[1] = 'e'; 
        buf_pw[2] = 'l';
        buf_pw[3] = 'l';
        buf_pw[4] = 'o';
        buf_pw[5] = 0;
    }
    if (idx == 0)
    {
        while(buf_pw[0] != 'A');
        printf("%c\n", buf_pw[0]);
        printf("%c\n", buf_pw[1]); 
        printf("%c\n", buf_pw[2]);
        printf("%c\n", buf_pw[3]); 
        printf("%c\n", buf_pw[4]);
        printf("%c\n", buf_pw[5]); 
        printf("%s\n", buf_pw);
        printf("%s\n", wololol);  
    }
    printf("Hello World\n");
}

运行 程序多次会产生不同的结果。大多数时候,它会产生如下所示的输出:

A
e
l
l 
o

(null)
wololol
Hello World
Hello World
Hello World
Hello World

另一种情况是:

A
e
l
l 
o

Aello
wololol
Hello World
Hello World
Hello World
Hello World

我预计第二种情况是正确的输出,但它很少发生。是什么导致 writing/reading pw 行为异常?

Printf OpenCL spec

我会小心使用 "printf" 函数,因为它可能不遵循 OpenCL 的正常逻辑。规范是这样说的:

When the event that is associated with a particular kernel invocation is completed, the output of all printf() calls executed by this kernel invocation is flushed to the implementation-defined output stream. Calling clFinish on a command queue flushes all pending output by printf in previously enqueued and completed commands to the implementation-defined output stream. In the case that printf is executed from multiple work-items concurrently, there is no guarantee of ordering with respect to written data. For example, it is valid for the output of a work-item with a global id (0,0,1) to appear intermixed with the output of a work-item with a global id (0,0,4) and so on.

您的代码似乎是有效的(尽管代码中的 while 循环暂时令人困惑!;),并且您对正确输出的期望是合理的。

您的 OpenCL 安装似乎有一个 bug/issue。我发现 AMD GPU OpenCL 驱动程序特别存在 printf 行为问题。

有问题的 printf 应该 always 打印 "Aello",并且 never 打印“(null)”,如您所料.

问题可能是由于 printf() 的供应商实现中的竞争条件造成的。