如何在 Windows 上的 cuda 推力调用中指定 OpenMP 执行策略?

How to specify OpenMP execution policy in cuda thrust calls on Windows?

在将代码从 Linux 移植到 Windows 时,感谢 Visual Studio C++ 2015 Community,我遇到了无法理解的编译错误。 下面是一个显示此错误的示例程序,它构建了一个双精度向量,然后使用 OpenMP 使用 cuda 推力对其进行排序。

# include <thrust/sort.h>
# include <thrust/system/omp/execution_policy.h>
# include <chrono>
# include <random>
# include <vector>

double unit_random()
{
  static std::default_random_engine generator(std::chrono::system_clock::now().time_since_epoch().count());
  static std::uniform_real_distribution<double> distribution(double(0), double(1));
  return distribution(generator);
}

int main(int argc, char* argv[])
{
  constexpr size_t input_size = 100000;
  std::vector< double > input(input_size, 0);
  for ( size_t i = 0; i < input_size; ++i)
    input[i] = unit_random() * 1000;

  thrust::sort(thrust::omp::par, input.begin(), input.end());
  return 0;
}

这是在 Visual Studio 控制台中看到的错误(文件名已缩短):

thrust/system/omp/detail/sort.inl(136): error C2146: syntax error: missing ';' before identifier 'nseg'
thrust/detail/sort.inl(83): note: see reference to function template instantiation 'void thrust::system::omp::detail::stable_sort<thrust::system::omp::detail::par_t,RandomAccessIterator,StrictWeakOrdering>(thrust::system::omp::detail::execution_policy<thrust::system::omp::detail::par_t> &,RandomAccessIterator,RandomAccessIterator,StrictWeakOrdering)' being compiled
  with
  [
    RandomAccessIterator=std::_Vector_iterator<std::_Vector_val<std::_Simple_types<double>>>,
    StrictWeakOrdering=thrust::less<value_type>
  ]
thrust/system/detail/generic/sort.inl(63): note: see reference to function template instantiation 'void thrust::stable_sort<DerivedPolicy,RandomAccessIterator,StrictWeakOrdering>(const thrust::detail::execution_policy_base<DerivedPolicy> &,RandomAccessIterator,RandomAccessIterator,StrictWeakOrdering)' being compiled
  with
  [
      DerivedPolicy=thrust::system::omp::detail::par_t,
      RandomAccessIterator=std::_Vector_iterator<std::_Vector_val<std::_Simple_types<double>>>,
      StrictWeakOrdering=thrust::less<value_type>
  ]
thrust/detail/sort.inl(56): note: see reference to function template instantiation 'void thrust::system::detail::generic::sort<Derived,RandomAccessIterator,StrictWeakOrdering>(thrust::execution_policy<Derived> &,RandomAccessIterator,RandomAccessIterator,StrictWeakOrdering)' being compiled
  with
  [
      Derived=thrust::system::omp::detail::par_t,
      RandomAccessIterator=std::_Vector_iterator<std::_Vector_val<std::_Simple_types<double>>>,
      StrictWeakOrdering=thrust::less<value_type>
  ]
thrust/system/detail/generic/sort.inl(49): note: see reference to function template instantiation 'void thrust::sort<DerivedPolicy,RandomAccessIterator,thrust::less<value_type>>(const thrust::detail::execution_policy_base<DerivedPolicy> &,RandomAccessIterator,RandomAccessIterator,StrictWeakOrdering)' being compiled
  with
  [
      DerivedPolicy=thrust::system::omp::detail::par_t,
      RandomAccessIterator=std::_Vector_iterator<std::_Vector_val<std::_Simple_types<double>>>,
      StrictWeakOrdering=thrust::less<value_type>
  ]
thrust/detail/sort.inl(41): note: see reference to function template instantiation 'void thrust::system::detail::generic::sort<Derived,RandomAccessIterator>(thrust::execution_policy<Derived> &,RandomAccessIterator,RandomAccessIterator)' being compiled
  with
  [
      Derived=thrust::system::omp::detail::par_t,
      RandomAccessIterator=std::_Vector_iterator<std::_Vector_val<std::_Simple_types<double>>>
  ]
windows_cuda_thrust_error.cc(24): note: see reference to function template instantiation 'void thrust::sort<DerivedPolicy,std::_Vector_iterator<std::_Vector_val<std::_Simple_types<double>>>>(const thrust::detail::execution_policy_base<DerivedPolicy> &,RandomAccessIterator,RandomAccessIterator)' being compiled
  with
  [
      DerivedPolicy=thrust::system::omp::detail::par_t,
      RandomAccessIterator=std::_Vector_iterator<std::_Vector_val<std::_Simple_types<double>>>
  ]
thrust/system/omp/detail/sort.inl(136): error C2275: 'IndexType': illegal use of this type as an expression
thrust/system/omp/detail/sort.inl(113): note: see declaration of 'IndexType'
thrust/system/omp/detail/sort.inl(136): error C2065: 'nseg': undeclared identifier
thrust/system/omp/detail/sort.inl(142): error C2065: 'nseg': undeclared identifier
thrust/system/omp/detail/sort.inl(159): error C2065: 'nseg': undeclared identifier
========== Build: 0 succeeded, 1 failed, 1 up-to-date, 0 skipped ==========

相同的代码在 Linux 上运行良好。

我们应该如何在 Windows 的 cuda 推力调用中指定 OpenMP 执行策略? 或者,我在这个特定的上下文中做错了什么?

使用的 thrust 版本是 1.8.1,这里是 thrust 函数的摘录,在文件 thrust/system/omp/detail/sort.inl 中,引发编译错误:

template<typename DerivedPolicy,
         typename RandomAccessIterator,
         typename StrictWeakOrdering>
void stable_sort(execution_policy<DerivedPolicy> &exec,
                 RandomAccessIterator first,
                 RandomAccessIterator last,
                 StrictWeakOrdering comp)
{
  // ...
  typedef typename thrust::iterator_difference<RandomAccessIterator>::type IndexType;

  if(first == last)
    return;

  #pragma omp parallel
  {
    thrust::system::detail::internal::uniform_decomposition<IndexType> decomp(last - first, 1, omp_get_num_threads());

    // process id
    IndexType p_i = omp_get_thread_num();

    // every thread sorts its own tile
    if(p_i < decomp.size())
    {
      thrust::stable_sort(thrust::seq,
                          first + decomp[p_i].begin(),
                          first + decomp[p_i].end(),
                          comp);
    }

    #pragma omp barrier

    IndexType nseg = decomp.size(); // line 136
    // ...
  }
}

根据@kangshiyin 的建议,我在 github 上提交了一个问题(参见 issue #817),推力开发人员找到了解决方法。问题来自 MSVC 当前处理 OpenMP 代码的方式,因此问题中提供的代码非常好。

如果出现类似问题,请先尝试更新到最新版本的推力。您也可以尝试应用相同的解决方法:只需在引发错误的行前添加一个分号。