"sizeof type T on my CUDA device(s)" 的成语是什么?

What's the idiom for "sizeof type T on my CUDA device(s)"?

理论上,CUDA 设备上的类型大小可能不同于它们在主机平台上的大小。那么,在代码中表达 "sizeof(T) on my CUDA device" 的惯用方式是什么——除了将您自己的类型 id 映射到您知道的值之外?

任何当前支持的 CUDA 平台都不需要您询问的任何内容。 CUDA 工具链与主机编译器和主机 C++ 运行time 库高度集成的原因之一是 保证 主机上基本类型的大小和设备始终匹配。不需要惯用的尺寸翻译。 sizeof 的结果对于主机和设备将始终相同。请注意,基本类型的大小可能因平台而异(Windows 是 LLP64/IL32P64 平台,linux 和 OS X 是 LP64/I32LP64 平台),但是这对 GPU 没有影响。

另请注意,GPU 可以对复合类型施加对齐要求,这可能意味着编译后的大小与您预期的不同。文档中详细讨论了适用的条件。

例如,考虑以下简单的示例代码:

#include <cstdio>

__device__ __host__ __noinline__ void printsizes(const char* title)
{
    printf("%s\n", title);
    printf("sizeof(void*) = %ld\n", (unsigned long)sizeof(void*));
    printf("sizeof(char) = %ld\n", (unsigned long)sizeof(char));
    printf("sizeof(bool) = %ld\n", (unsigned long)sizeof(bool));
    printf("sizeof(short) = %ld\n", (unsigned long)sizeof(short));
    printf("sizeof(int) = %ld\n", (unsigned long)sizeof(int));
    printf("sizeof(long) = %ld\n", (unsigned long)sizeof(long));
    printf("sizeof(long long) = %ld\n", (unsigned long)sizeof(long long));
}

__global__ void printkernel()
{
    printsizes("On the device:");
}

int main()
{
    printsizes("On the host:");

    printkernel<<<1,1>>>();
    cudaDeviceSynchronize();
    cudaDeviceReset();

    return 0;
}

在 Linux 64 平台上编译和 运行 产生这个:

$ nvcc -arch=sm_52 -m64 -o sizeof64 sizeof.cu
$ ./sizeof64
On the host:
sizeof(void*) = 8
sizeof(char) = 1
sizeof(bool) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8
sizeof(long long) = 8
On the device:
sizeof(void*) = 8
sizeof(char) = 1
sizeof(bool) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8
sizeof(long long) = 8

建立在 64 位 Windows 平台上,它产生了这个:

>nvcc -arch=sm_21 -m64 sizes.cu
sizes.cu
   Creating library a.lib and object a.exp
>a.exe
On the host:
sizeof(void*) = 8
sizeof(char) = 1
sizeof(bool) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 4
sizeof(long long) = 8
On the device:
sizeof(void*) = 8
sizeof(char) = 1
sizeof(bool) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 4
sizeof(long long) = 8

建立在 32 位 Windows 平台上,它产生了这个:

>nvcc -arch=sm_21 -m32 sizes.cu
sizes.cu
   Creating library a.lib and object a.exp

C:\Users\david\Documents>a.exe
On the host:
sizeof(void*) = 4
sizeof(char) = 1
sizeof(bool) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 4
sizeof(long long) = 8
On the device:
sizeof(void*) = 4
sizeof(char) = 1
sizeof(bool) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 4
sizeof(long long) = 8

请注意,void *long 的大小可能因平台而异。但在每种情况下,GPU 大小都与主机大小相匹配。这是CUDA驱动和GPU的一个基本设计原则运行time.