在设备向量上设置每个主机向量的 int 数组类型的数据元素

Question

我正在尝试在 CUDA Thrust 上实现以下 C++ 函数：

void setFragment( vector< Atom * > &vStruct, vector< Fragment * > &vFragment ) {
    Fragment *frag;

    int n = vStruct.size();

    for( int i = 0 ; i < n-2 ; i++ ){
        frag = new Fragment();
        frag->index[0] = i;
        frag->index[1] = i+1;   
        frag->index[2] = i+2;   

        vFragment.push_back( frag );    
    }
}

为此，我创建了一个仿函数来按以下方式设置每个片段向量的索引：

struct setFragment_functor
{
    const int n;

    setFragment_functor(int _n) : n(_n) {}

    __host__ __device__
    void operator() (Fragment *frag) {
        frag->index[0] = n;
        frag->index[1] = n+1;
        frag->index[2] = n+2;       
    }
};

void setFragment( vector< Atom * > &vStruct, vector< Fragment * > &vFragment ) {
    int n = vStruct.size();
    thrust::device_vector<Fragment *> d_vFragment(n-2);

    thrust::transform( d_vFragment.begin(), d_vFragment.end(), setFragment_functor( thrust::counting_iterator<int>(0) ) );

    thrust::copy(d_vFragment.begin(), d_vFragment.end(), vFragment.begin());        
}

但是，我应用的转换出现以下错误：

1) error: no instance of constructor "setFragment_functor::setFragment_functor" matches the argument list
            argument types are: (thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>) 
2) error: no instance of overloaded function "thrust::transform" matches the argument list
        argument types are: (thrust::detail::normal_iterator<thrust::device_ptr<Fragment *>>, thrust::detail::normal_iterator<thrust::device_ptr<Fragment *>>, <error-type>)

我是 CUDA 新手。如果有人可以帮助我在 CUDA 上实现 C++ 功能，我将不胜感激。

Answer 1

说白了，你写的代码有几个明显的问题，永远无法按照你想象的方式工作。除此之外，我猜测首先要运行在 GPU 上使用这样的函数的理由是因为分析表明它非常慢。这种缓慢是因为它的设计非常糟糕，并且可能会调用 new 和 push_back 数百万次以获得一个合适大小的输入数组。没有办法在 GPU 上加速这些功能。它们更慢，而不是更快。使用 GPU 构建这种类型的结构数组只是为了将它们复制回主机的想法与尝试使用 thrust to accelerate file I/O 一样不合逻辑。从字面上看，没有硬件或问题大小可以使您按照建议执行的操作比运行原始主机代码更快。 GPU 上的延迟和 GPU 与主机之间互连的带宽保证了它。

使用 thrust 在 GPU 内存中初始化结构数组的元素是微不足道的。 tabulate 转换可以与这样的仿函数一起使用：

#include <thrust/device_vector.h>
#include <thrust/tabulate.h>
#include <iostream>

struct Fragment
{
   int index[3];
   Fragment() = default;
};

struct functor
{
    __device__ __host__
    Fragment operator() (const int &i) const { 
        Fragment f; 
        f.index[0] = i; f.index[1] = i+1; f.index[2] = i+2; 
        return f;
    }
};


int main()
{
    const int N = 10;
    thrust::device_vector<Fragment> dvFragment(N);
    thrust::tabulate(dvFragment.begin(), dvFragment.end(), functor());

    for(auto p : dvFragment) {
        Fragment f = p;
        std::cout << f.index[0] << " " << f.index[1] << " " << f.index[2] << std::endl;
    }

    return 0;
}

其中运行是这样的：

$ nvcc -arch=sm_52 -std=c++14 -ccbin=g++-7 -o mobasher Mobasher.cu 
$ cuda-memcheck ./mobasher 
========= CUDA-MEMCHECK
0 1 2
1 2 3
2 3 4
3 4 5
4 5 6
5 6 7
6 7 8
7 8 9
8 9 10
9 10 11
========= ERROR SUMMARY: 0 errors

但这不是您问题中原始主机代码的直接翻译。

在设备向量上设置每个主机向量的 int 数组类型的数据元素

Setting each host vector's data element of type int array on device vector

c++

stl

cuda

vector

thrust