使用 FindCUDA.cmake 的编译错误和使用 THRUST_DEVICE_SYSTEM_OMP 的推力
Compilation error using FindCUDA.cmake and Thrust with THRUST_DEVICE_SYSTEM_OMP
我最近发现 Thrust 除了其经典的 cuda 功能外,还能够处理自动 OMP 和 TBB 并行化。
虽然我能够在一个简单的例子中使用这个极其通用的功能,但我的 cmake 配置产生了编译错误,也许我使用 FindCUDA.cmake 的方式不对,或者这个模块不能与 Thrust this 一起使用方式?
这是我的 Test.cu 文件:
#include <thrust/device_vector.h>
#include <cstdio>
struct cuda_hello
{
__host__ __device__
void operator()(int x)
{
printf("Hello, world from Cuda!\n");
}
};
int main()
{
thrust::device_vector<int> cuda_vec(1, 0);
thrust::for_each(cuda_vec.begin(),cuda_vec.end(),cuda_hello());
}
并且,工作编译行:
nvcc Test.cu -lgomp -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_OMP -Xcompiler -fopenmp -gencode arch=compute_30,code=sm_30 -o Executable.exe
现在,使用THRUST_DEVICE_SYSTEM_OMP时无法生成在linux下正确编译的Makefile的cmakefile(为CC 3.0的设备编译):
PROJECT(ExecutableCmake)
set (OUTPUT_NAME ExecutableCmake)
cmake_minimum_required (VERSION 2.8)
#test variable
#set(THRUST_DEVICE_SYSTEM THRUST_DEVICE_SYSTEM_CUDA)
set(THRUST_DEVICE_SYSTEM THRUST_DEVICE_SYSTEM_OMP)
#set(THRUST_DEVICE_SYSTEM THRUST_DEVICE_SYSTEM_TBB)
########################################
#### Cuda Part ####
########################################
find_package(CUDA REQUIRED)
list( APPEND CUDA_NVCC_FLAGS -gencode arch=compute_30,code=sm_30 -DTHRUST_DEVICE_SYSTEM=${THRUST_DEVICE_SYSTEM} )
set (sources_gpu_cuda
Test.cu
)
########################################
#### /Cuda Part ####
########################################
########################################
#### OMP Part ####
########################################
set(omp_deps gomp)
########################################
#### /OMP Part ####
########################################
set (sources
#cuda source files
${sources_gpu_cuda}
)
cuda_add_executable(${OUTPUT_NAME} ${sources} ${headers})
target_link_libraries (${OUTPUT_NAME} ${omp_deps})
编译错误的类型是:
/usr/local/cuda/include/thrust/system/omp/detail/for_each.inl(53): error: incomplete type is not allowed
detected during:
instantiation of "RandomAccessIterator thrust::system::omp::detail::for_each_n(thrust::system::omp::detail::execution_policy<DerivedPolicy> &, RandomAccessIterator, Size, UnaryFunction) [with DerivedPolicy=thrust::system::omp::detail::tag, RandomAccessIterator=thrust::device_ptr<int>, Size=unsigned long, UnaryFunction=thrust::detail::host_generate_functor<thrust::detail::fill_functor<int>>]"
/usr/local/cuda/include/thrust/detail/for_each.inl(69): here
instantiation of "InputIterator thrust::for_each_n(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, Size, UnaryFunction) [with DerivedPolicy=thrust::system::omp::detail::tag, InputIterator=thrust::device_ptr<int>, Size=unsigned long, UnaryFunction=thrust::detail::host_generate_functor<thrust::detail::fill_functor<int>>]"
/usr/local/cuda/include/thrust/system/detail/generic/generate.inl(52): here
instantiation of "OutputIterator thrust::system::detail::generic::generate_n(thrust::execution_policy<ExecutionPolicy> &, OutputIterator, Size, Generator) [with ExecutionPolicy=thrust::system::omp::detail::tag, OutputIterator=thrust::device_ptr<int>, Size=unsigned long, Generator=thrust::detail::fill_functor<int>]"
/usr/local/cuda/include/thrust/detail/generate.inl(56): here
instantiation of "OutputIterator thrust::generate_n(const thrust::detail::execution_policy_base<DerivedPolicy> &, OutputIterator, Size, Generator) [with DerivedPolicy=thrust::system::omp::detail::tag, OutputIterator=thrust::device_ptr<int>, Size=unsigned long, Generator=thrust::detail::fill_functor<int>]"
/usr/local/cuda/include/thrust/system/detail/generic/fill.h(45): here
instantiation of "OutputIterator thrust::system::detail::generic::fill_n(thrust::execution_policy<DerivedPolicy> &, OutputIterator, Size, const T &) [with DerivedPolicy=thrust::system::omp::detail::tag, OutputIterator=thrust::device_ptr<int>, Size=unsigned long, T=int]"
/usr/local/cuda/include/thrust/detail/fill.inl(50): here
[ 6 instantiation contexts not shown ]
instantiation of "void thrust::detail::contiguous_storage<T, Alloc>::uninitialized_fill_n(thrust::detail::contiguous_storage<T, Alloc>::iterator, thrust::detail::contiguous_storage<T, Alloc>::size_type, const thrust::detail::contiguous_storage<T, Alloc>::value_type &) [with T=int, Alloc=thrust::device_malloc_allocator<int>]"
/usr/local/cuda/include/thrust/detail/vector_base.inl(164): here
instantiation of "void thrust::detail::vector_base<T, Alloc>::fill_init(thrust::detail::vector_base<T, Alloc>::size_type, const T &) [with T=int, Alloc=thrust::device_malloc_allocator<int>]"
/usr/local/cuda/include/thrust/detail/vector_base.inl(139): here
instantiation of "void thrust::detail::vector_base<T, Alloc>::init_dispatch(IteratorOrIntegralType, IteratorOrIntegralType, thrust::detail::true_type) [with T=int, Alloc=thrust::device_malloc_allocator<int>, IteratorOrIntegralType=int]"
/usr/local/cuda/include/thrust/detail/vector_base.inl(224): here
instantiation of "thrust::detail::vector_base<T, Alloc>::vector_base(InputIterator, InputIterator) [with T=int, Alloc=thrust::device_malloc_allocator<int>, InputIterator=int]"
/usr/local/cuda/include/thrust/device_vector.h(148): here
instantiation of "thrust::device_vector<T, Alloc>::device_vector(InputIterator, InputIterator) [with T=int, Alloc=thrust::device_malloc_allocator<int>, InputIterator=int]"
/usr/local/cuda/include/thrust/system/omp/detail/for_each.inl(53): error: incomplete type is not allowed
detected during:
instantiation of "RandomAccessIterator thrust::system::omp::detail::for_each_n(thrust::system::omp::detail::execution_policy<DerivedPolicy> &, RandomAccessIterator, Size, UnaryFunction) [with DerivedPolicy=thrust::system::omp::detail::tag, RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<int>>, Size=long, UnaryFunction=cuda_hello]"
(89): here
instantiation of "RandomAccessIterator thrust::system::omp::detail::for_each(thrust::system::omp::detail::execution_policy<DerivedPolicy> &, RandomAccessIterator, RandomAccessIterator, UnaryFunction) [with DerivedPolicy=thrust::system::omp::detail::tag, RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<int>>, UnaryFunction=cuda_hello]"
/usr/local/cuda/include/thrust/detail/for_each.inl(43): here
instantiation of "InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy=thrust::system::omp::detail::tag, InputIterator=thrust::detail::normal_iterator<thrust::device_ptr<int>>, UnaryFunction=cuda_hello]"
/usr/local/cuda/include/thrust/detail/for_each.inl(57): here
instantiation of "InputIterator thrust::for_each(InputIterator, InputIterator, UnaryFunction) [with InputIterator=thrust::detail::normal_iterator<thrust::device_ptr<int>>, UnaryFunction=cuda_hello]"
2 errors detected in the compilation of "/tmp/tmpxft_00002d3a_00000000-6_Test.cpp1.ii".
CMake Error at ExecutableCmake_generated_Test.cu.o.cmake:264 (message):
Error generating file
make[2]: *** [CMakeFiles/ExecutableCmake.dir/./ExecutableCmake_generated_Test.cu.o] Erreur 1
make[1]: *** [CMakeFiles/ExecutableCmake.dir/all] Erreur 2
make: *** [all] Erreur 2
这些错误看起来与我将 cuda 代码放入非 .cu 文件时遇到的错误完全一样,但我不太了解 cmake,无法理解为什么会出现此问题。
提前感谢您的帮助
您似乎缺少一些 nvcc 标志。添加这个对我有用:
list(APPEND CUDA_NVCC_FLAGS -Xcompiler -fopenmp)
我最近发现 Thrust 除了其经典的 cuda 功能外,还能够处理自动 OMP 和 TBB 并行化。
虽然我能够在一个简单的例子中使用这个极其通用的功能,但我的 cmake 配置产生了编译错误,也许我使用 FindCUDA.cmake 的方式不对,或者这个模块不能与 Thrust this 一起使用方式?
这是我的 Test.cu 文件:
#include <thrust/device_vector.h>
#include <cstdio>
struct cuda_hello
{
__host__ __device__
void operator()(int x)
{
printf("Hello, world from Cuda!\n");
}
};
int main()
{
thrust::device_vector<int> cuda_vec(1, 0);
thrust::for_each(cuda_vec.begin(),cuda_vec.end(),cuda_hello());
}
并且,工作编译行:
nvcc Test.cu -lgomp -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_OMP -Xcompiler -fopenmp -gencode arch=compute_30,code=sm_30 -o Executable.exe
现在,使用THRUST_DEVICE_SYSTEM_OMP时无法生成在linux下正确编译的Makefile的cmakefile(为CC 3.0的设备编译):
PROJECT(ExecutableCmake)
set (OUTPUT_NAME ExecutableCmake)
cmake_minimum_required (VERSION 2.8)
#test variable
#set(THRUST_DEVICE_SYSTEM THRUST_DEVICE_SYSTEM_CUDA)
set(THRUST_DEVICE_SYSTEM THRUST_DEVICE_SYSTEM_OMP)
#set(THRUST_DEVICE_SYSTEM THRUST_DEVICE_SYSTEM_TBB)
########################################
#### Cuda Part ####
########################################
find_package(CUDA REQUIRED)
list( APPEND CUDA_NVCC_FLAGS -gencode arch=compute_30,code=sm_30 -DTHRUST_DEVICE_SYSTEM=${THRUST_DEVICE_SYSTEM} )
set (sources_gpu_cuda
Test.cu
)
########################################
#### /Cuda Part ####
########################################
########################################
#### OMP Part ####
########################################
set(omp_deps gomp)
########################################
#### /OMP Part ####
########################################
set (sources
#cuda source files
${sources_gpu_cuda}
)
cuda_add_executable(${OUTPUT_NAME} ${sources} ${headers})
target_link_libraries (${OUTPUT_NAME} ${omp_deps})
编译错误的类型是:
/usr/local/cuda/include/thrust/system/omp/detail/for_each.inl(53): error: incomplete type is not allowed
detected during:
instantiation of "RandomAccessIterator thrust::system::omp::detail::for_each_n(thrust::system::omp::detail::execution_policy<DerivedPolicy> &, RandomAccessIterator, Size, UnaryFunction) [with DerivedPolicy=thrust::system::omp::detail::tag, RandomAccessIterator=thrust::device_ptr<int>, Size=unsigned long, UnaryFunction=thrust::detail::host_generate_functor<thrust::detail::fill_functor<int>>]"
/usr/local/cuda/include/thrust/detail/for_each.inl(69): here
instantiation of "InputIterator thrust::for_each_n(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, Size, UnaryFunction) [with DerivedPolicy=thrust::system::omp::detail::tag, InputIterator=thrust::device_ptr<int>, Size=unsigned long, UnaryFunction=thrust::detail::host_generate_functor<thrust::detail::fill_functor<int>>]"
/usr/local/cuda/include/thrust/system/detail/generic/generate.inl(52): here
instantiation of "OutputIterator thrust::system::detail::generic::generate_n(thrust::execution_policy<ExecutionPolicy> &, OutputIterator, Size, Generator) [with ExecutionPolicy=thrust::system::omp::detail::tag, OutputIterator=thrust::device_ptr<int>, Size=unsigned long, Generator=thrust::detail::fill_functor<int>]"
/usr/local/cuda/include/thrust/detail/generate.inl(56): here
instantiation of "OutputIterator thrust::generate_n(const thrust::detail::execution_policy_base<DerivedPolicy> &, OutputIterator, Size, Generator) [with DerivedPolicy=thrust::system::omp::detail::tag, OutputIterator=thrust::device_ptr<int>, Size=unsigned long, Generator=thrust::detail::fill_functor<int>]"
/usr/local/cuda/include/thrust/system/detail/generic/fill.h(45): here
instantiation of "OutputIterator thrust::system::detail::generic::fill_n(thrust::execution_policy<DerivedPolicy> &, OutputIterator, Size, const T &) [with DerivedPolicy=thrust::system::omp::detail::tag, OutputIterator=thrust::device_ptr<int>, Size=unsigned long, T=int]"
/usr/local/cuda/include/thrust/detail/fill.inl(50): here
[ 6 instantiation contexts not shown ]
instantiation of "void thrust::detail::contiguous_storage<T, Alloc>::uninitialized_fill_n(thrust::detail::contiguous_storage<T, Alloc>::iterator, thrust::detail::contiguous_storage<T, Alloc>::size_type, const thrust::detail::contiguous_storage<T, Alloc>::value_type &) [with T=int, Alloc=thrust::device_malloc_allocator<int>]"
/usr/local/cuda/include/thrust/detail/vector_base.inl(164): here
instantiation of "void thrust::detail::vector_base<T, Alloc>::fill_init(thrust::detail::vector_base<T, Alloc>::size_type, const T &) [with T=int, Alloc=thrust::device_malloc_allocator<int>]"
/usr/local/cuda/include/thrust/detail/vector_base.inl(139): here
instantiation of "void thrust::detail::vector_base<T, Alloc>::init_dispatch(IteratorOrIntegralType, IteratorOrIntegralType, thrust::detail::true_type) [with T=int, Alloc=thrust::device_malloc_allocator<int>, IteratorOrIntegralType=int]"
/usr/local/cuda/include/thrust/detail/vector_base.inl(224): here
instantiation of "thrust::detail::vector_base<T, Alloc>::vector_base(InputIterator, InputIterator) [with T=int, Alloc=thrust::device_malloc_allocator<int>, InputIterator=int]"
/usr/local/cuda/include/thrust/device_vector.h(148): here
instantiation of "thrust::device_vector<T, Alloc>::device_vector(InputIterator, InputIterator) [with T=int, Alloc=thrust::device_malloc_allocator<int>, InputIterator=int]"
/usr/local/cuda/include/thrust/system/omp/detail/for_each.inl(53): error: incomplete type is not allowed
detected during:
instantiation of "RandomAccessIterator thrust::system::omp::detail::for_each_n(thrust::system::omp::detail::execution_policy<DerivedPolicy> &, RandomAccessIterator, Size, UnaryFunction) [with DerivedPolicy=thrust::system::omp::detail::tag, RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<int>>, Size=long, UnaryFunction=cuda_hello]"
(89): here
instantiation of "RandomAccessIterator thrust::system::omp::detail::for_each(thrust::system::omp::detail::execution_policy<DerivedPolicy> &, RandomAccessIterator, RandomAccessIterator, UnaryFunction) [with DerivedPolicy=thrust::system::omp::detail::tag, RandomAccessIterator=thrust::detail::normal_iterator<thrust::device_ptr<int>>, UnaryFunction=cuda_hello]"
/usr/local/cuda/include/thrust/detail/for_each.inl(43): here
instantiation of "InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy=thrust::system::omp::detail::tag, InputIterator=thrust::detail::normal_iterator<thrust::device_ptr<int>>, UnaryFunction=cuda_hello]"
/usr/local/cuda/include/thrust/detail/for_each.inl(57): here
instantiation of "InputIterator thrust::for_each(InputIterator, InputIterator, UnaryFunction) [with InputIterator=thrust::detail::normal_iterator<thrust::device_ptr<int>>, UnaryFunction=cuda_hello]"
2 errors detected in the compilation of "/tmp/tmpxft_00002d3a_00000000-6_Test.cpp1.ii".
CMake Error at ExecutableCmake_generated_Test.cu.o.cmake:264 (message):
Error generating file
make[2]: *** [CMakeFiles/ExecutableCmake.dir/./ExecutableCmake_generated_Test.cu.o] Erreur 1
make[1]: *** [CMakeFiles/ExecutableCmake.dir/all] Erreur 2
make: *** [all] Erreur 2
这些错误看起来与我将 cuda 代码放入非 .cu 文件时遇到的错误完全一样,但我不太了解 cmake,无法理解为什么会出现此问题。
提前感谢您的帮助
您似乎缺少一些 nvcc 标志。添加这个对我有用:
list(APPEND CUDA_NVCC_FLAGS -Xcompiler -fopenmp)