cuFFT 静态链接失败

cuFFT static linking failed

我试着 link 静态 cuFFT。

nvcc -ccbin g++ -dc -O3 -arch=sm_35  -c fftStat.cu fftStat.o;
nvcc -ccbin g++ -dlink -arch=sm_35 fftStat.o -o link.o;
g++ main.cc link.o fftStat.o -lcudart -lcudadevrt -lcufft_static   -lculibos -ldl -pthread -lrt -L/usr/local/cuda-10.2/lib64 -o run

它给了我以下错误(没有显示所有错误)

/usr/local/cuda-10.2/lib64/libcufft_static.a(fft_dimension_class_multi.o): In function `__sti____cudaRegisterAll()':
fft_dimension_class_multi.compute_75.cudafe1.cpp:(.text+0xdad): undefined reference to `__cudaRegisterLinkedBinary_44_fft_dimension_class_multi_compute_75_cpp1_ii_466e44ab'
/usr/local/cuda-10.2/lib64/libcufft_static.a(fft_dimension_class_multi.o): In function `global constructors keyed to BaseListMulti::radices':
fft_dimension_class_multi.compute_75.cudafe1.cpp:(.text+0x1c8d): undefined reference to 
float_64bit_regular_RT_SM50_plus.compute_75.cudafe1.cpp:(.text+0x3d): undefined reference to `__cudaRegisterLinkedBinary_51_float_64bit_regular_RT_SM50_plus_compute_75_cpp1_ii_66731515'
/usr/local/cuda-10.2/lib64/libcufft_static.a(float_64bit_regular_RT_SM50_plus.o): In function `global constructors keyed to compile_unitsforce_compile_float_width64_t_regular_fft_kernels__SM50_unbounded()':
float_64bit_regular_RT_SM50_plus.compute_75.cudafe1.cpp:(.text+0x29d): undefined reference to `__cudaRegisterLinkedBinary_51_float_64bit_regular_RT_SM50_plus_compute_75_cpp1_ii_66731515'
/usr/local/cuda-10.2/lib64/libcufft_static.a(float_64bit_regular_RT_SM60_plus.o): In function `__sti____cudaRegisterAll()':
float_64bit_regular_RT_SM60_plus.compute_75.cudafe1.cpp:(.text+0x3d): undefined reference to `__cudaRegisterLinkedBinary_51_float_64bit_regular_RT_SM60_plus_compute_75_cpp1_ii_dbb979db'
/usr/local/cuda-10.2/lib64/libcufft_static.a(float_64bit_regular_RT_SM60_plus.o): In function `global constructors keyed to compile_unitsforce_compile_float_width64_t_regular_fft_kernels__SM60_unbounded()':
float_64bit_regular_RT_SM60_plus.compute_75.cudafe1.cpp:(.text+0x18d): undefined reference to `__cudaRegisterLinkedBinary_51_float_64bit_regular_RT_SM60_plus_compute_75_cpp1_ii_dbb979db'
/usr/local/cuda-10.2/lib64/libcufft_static.a(half_32bit_regular_RT_SM53_plus.o): In function `__sti____cudaRegisterAll()':
half_32bit_regular_RT_SM53_plus.compute_75.cudafe1.cpp:(.text+0x3d): undefined reference to `__cudaRegisterLinkedBinary_50_half_32bit_regular_RT_SM53_plus_compute_75_cpp1_ii_96a57339'
/usr/local/cuda-10.2/lib64/libcufft_static.a(half_32bit_regular_RT_SM53_plus.o): In function `global constructors keyed to compile_unitsforce_compile_half_width32_t_regular_fft_kernels__SM53_unbounded()':
half_32bit_regular_RT_SM53_plus.compute_75.cudafe1.cpp:(.text+0x1b0d): undefined reference to `__cudaRegisterLinkedBinary_50_half_32bit_regular_RT_SM53_plus_compute_75_cpp1_ii_96a57339'
/usr/local/cuda-10.2/lib64/libcufft_static.a(half_32bit_vector_RT_SM53_plus.o): In function `__sti____cudaRegisterAll()':
half_32bit_vector_RT_SM53_plus.compute_75.cudafe1.cpp:(.text+0x3d): undefined reference to 
dpRadix0343C_cb.compute_75.cudafe1.cpp:(.text+0xa54): undefined reference to `__cudaRegisterLinkedBinary_34_dpRadix0343C_cb_compute_75_cpp1_ii_b592a056'
collect2: error: ld returned 1 exit status

动态 link 工作:

g++ main.cc link.o fftStat.o -lcudart -lcudadevrt -lcufft -L/usr/local/cuda-10.2/lib64 -o run

我遵循了这个指南https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#code-changes-for-separate-compilation 以及本指南 https://docs.nvidia.com/cuda/cufft/index.html#static-library 但显然缺少了一些东西。

您试图在最终 link 完成的一些事情需要在设备 link 完成(您的第 2 步)。以下似乎对我有用:

$ cat fftStat.cu
#include <cufft.h>

void test(){

  cufftHandle h;
  cufftCreate(&h);
}

$ cat main.cpp
void test();

int main(){

  test();
}

$ nvcc -ccbin g++ -dc -O3 -arch=sm_35  -c fftStat.cu fftStat.o
$ nvcc -ccbin g++ -dlink -arch=sm_35 fftStat.o -o link.o -lcufft_static -lcudadevrt
$ g++ main.cpp link.o fftStat.o -L/usr/local/cuda-10.2/lib64   -lcufft_static -lcudart -lcudadevrt -lculibos -ldl -pthread -lrt  -o run

请注意,我还重新安排了一些 link 订单以说明 link 依赖项。这可能重要也可能无关紧要,具体取决于 g++ 的确切版本。这里的一些需求(例如 device-link 步骤中的 -lcudadevrt )可能是您尚未显示的实际代码的函数。对于上面的代码,那项实际上不是必需的。