编译和链接 Cuda 和 Clang 以支持主机代码上的 c++20
Compiling and Linking Cuda and Clang to support c++20 on Host Code
我正在尝试编译,link 并使用 Clang 而不是 Gcc 执行一个简单的 Cuda 示例。使用 Clang 背后的总体思路是在主机代码中允许使用 c++20,并使用 llvm/clang 堆栈进行更多编译器优化。
我查看了以下来源:
llvm docs
google paper about gpucc
这个例子来自关于用 clang
编译 cuda 的 llvm 文档
#include <iostream>
__global__ void axpy(float a, float* x, float* y) {
y[threadIdx.x] = a * x[threadIdx.x];
}
int main(int argc, char* argv[]) {
const int kDataLen = 4;
float a = 2.0f;
float host_x[kDataLen] = {1.0f, 2.0f, 3.0f, 4.0f};
float host_y[kDataLen];
// Copy input data to device.
float* device_x;
float* device_y;
cudaMalloc(&device_x, kDataLen * sizeof(float));
cudaMalloc(&device_y, kDataLen * sizeof(float));
cudaMemcpy(device_x, host_x, kDataLen * sizeof(float),
cudaMemcpyHostToDevice);
// Launch the kernel.
axpy<<<1, kDataLen>>>(a, device_x, device_y);
// Copy output data to host.
cudaDeviceSynchronize();
cudaMemcpy(host_y, device_y, kDataLen * sizeof(float),
cudaMemcpyDeviceToHost);
// Print the results.
for (int i = 0; i < kDataLen; ++i) {
std::cout << "y[" << i << "] = " << host_y[i] << "\n";
}
cudaDeviceReset();
return 0;
}
编译和link使用的命令是:
clang++-12 axpy.cu -o axpy --cuda-gpu-arch=sm_72
-L/usr/local/cuda-11.4/lib64 -lcudart -ldl -lrt -pthread axpy.cu
--cuda-path=/usr/local/cuda-11 --no-cuda-version-check
输出表示编译成功但编译失败link:
clang: warning: Unknown CUDA version. cuda.h: CUDA_VERSION=11040. Assuming the l atest supported version 10.1 [-Wunknown-cuda-version]
/usr/bin/ld: /tmp/axpy-35c781.o: in function `__device_stub__axpy(float, float*, float*)':
axpy.cu:(.text+0x0): multiple definition of `__device_stub__axpy(float, float*, float*)'; /tmp/axpy-c82a7d.o:axpy.cu:(.text+0x0): first defined here
/usr/bin/ld: /tmp/axpy-35c781.o: in function `main':
axpy.cu:(.text+0xa0): multiple definition of `main'; /tmp/axpy-c82a7d.o:axpy.cu: (.text+0xa0): first defined here
clang: error: linker command failed with exit code 1 (use -v to see invocation)
该错误似乎表明 clang 将代码多次传递给 link 并且错误地包含了 main 两次。
OS: Ubuntu 20.04 内核 5.40
库达:11.4
Clang(已于 2013 年 11 月 12 日试用)
如能提供有关如何让 CUDA 和 Clang 协同工作的任何提示,我将不胜感激。到目前为止我尝试过的事情:不同的 Clang 版本 11/12/13。不同的 Cuda 版本 11.2/11.4.
The output indicates that it successfully compiles but fails to link:
axpy.cu:(.text+0xa0): multiple definition of `main';
这似乎已经在评论中进行了排序:
Maybe the only problem is that you pass axpy.cu twice in your compile command.
clang++-12 axpy.cu -o axpy --cuda-gpu-arch=sm_72 -L/usr/local/cuda-11.4/lib64 -lcudart -ldl -lrt -pthread axpy.cu --cuda-path=/usr/local/cuda-11 --no-cuda-version-check
^^^^^^^ ^^^^^^^
That was it. thank you.
我正在尝试编译,link 并使用 Clang 而不是 Gcc 执行一个简单的 Cuda 示例。使用 Clang 背后的总体思路是在主机代码中允许使用 c++20,并使用 llvm/clang 堆栈进行更多编译器优化。
我查看了以下来源: llvm docs google paper about gpucc 这个例子来自关于用 clang
编译 cuda 的 llvm 文档 #include <iostream>
__global__ void axpy(float a, float* x, float* y) {
y[threadIdx.x] = a * x[threadIdx.x];
}
int main(int argc, char* argv[]) {
const int kDataLen = 4;
float a = 2.0f;
float host_x[kDataLen] = {1.0f, 2.0f, 3.0f, 4.0f};
float host_y[kDataLen];
// Copy input data to device.
float* device_x;
float* device_y;
cudaMalloc(&device_x, kDataLen * sizeof(float));
cudaMalloc(&device_y, kDataLen * sizeof(float));
cudaMemcpy(device_x, host_x, kDataLen * sizeof(float),
cudaMemcpyHostToDevice);
// Launch the kernel.
axpy<<<1, kDataLen>>>(a, device_x, device_y);
// Copy output data to host.
cudaDeviceSynchronize();
cudaMemcpy(host_y, device_y, kDataLen * sizeof(float),
cudaMemcpyDeviceToHost);
// Print the results.
for (int i = 0; i < kDataLen; ++i) {
std::cout << "y[" << i << "] = " << host_y[i] << "\n";
}
cudaDeviceReset();
return 0;
}
编译和link使用的命令是:
clang++-12 axpy.cu -o axpy --cuda-gpu-arch=sm_72
-L/usr/local/cuda-11.4/lib64 -lcudart -ldl -lrt -pthread axpy.cu
--cuda-path=/usr/local/cuda-11 --no-cuda-version-check
输出表示编译成功但编译失败link:
clang: warning: Unknown CUDA version. cuda.h: CUDA_VERSION=11040. Assuming the l atest supported version 10.1 [-Wunknown-cuda-version]
/usr/bin/ld: /tmp/axpy-35c781.o: in function `__device_stub__axpy(float, float*, float*)':
axpy.cu:(.text+0x0): multiple definition of `__device_stub__axpy(float, float*, float*)'; /tmp/axpy-c82a7d.o:axpy.cu:(.text+0x0): first defined here
/usr/bin/ld: /tmp/axpy-35c781.o: in function `main':
axpy.cu:(.text+0xa0): multiple definition of `main'; /tmp/axpy-c82a7d.o:axpy.cu: (.text+0xa0): first defined here
clang: error: linker command failed with exit code 1 (use -v to see invocation)
该错误似乎表明 clang 将代码多次传递给 link 并且错误地包含了 main 两次。
OS: Ubuntu 20.04 内核 5.40 库达:11.4 Clang(已于 2013 年 11 月 12 日试用)
如能提供有关如何让 CUDA 和 Clang 协同工作的任何提示,我将不胜感激。到目前为止我尝试过的事情:不同的 Clang 版本 11/12/13。不同的 Cuda 版本 11.2/11.4.
The output indicates that it successfully compiles but fails to link:
axpy.cu:(.text+0xa0): multiple definition of `main';
这似乎已经在评论中进行了排序:
Maybe the only problem is that you pass axpy.cu twice in your compile command.
clang++-12 axpy.cu -o axpy --cuda-gpu-arch=sm_72 -L/usr/local/cuda-11.4/lib64 -lcudart -ldl -lrt -pthread axpy.cu --cuda-path=/usr/local/cuda-11 --no-cuda-version-check
^^^^^^^ ^^^^^^^
That was it. thank you.