推力::如何做这种选择性复制的情况

thrust:: how to do this case of selective copy

我正在使用推力在 GPU 上生成一些随机排列,如下所示:

// Compute a random list on CPU
int iterations = 500;
int src_size = 2048;
thrust::host_vector<int> cpu_rand_list (iterations * 4);

for(size_t i = 0; i < cpu_rand_list.size(); ++i) {
    cpu_rand_list[i] = src_size * (rand()/(1.0 + RAND_MAX));
}

// Copy the random list to GPU
thrust::device_vector<int> gpu_rand_list = cpu_rand_list;

现在 gpu_rand_list 包含一些索引的整数,我有另一个数组,如:

thrust:;device_vector<float> values(2048);
// These are now filled with some values
...

我想做的是创建另一个列表,它只包含来自 gpu_rand_list 的条目,这些条目对应于 values 中不等于 -1 的条目。所以在 CPU 中的代码是这样的:

std::vector<int> refined;
for (int i = 0; i < gpu_rand_list.size(); ++i) {
    if (values[gpu_rand_list[i]] != -1)
        refined.push_back(gpu_rand_list[i]);
}

有没有办法在推力上做到这一点?我尝试使用 copy_if 构造,但无法使其与这些多个数组一起使用。

推力::copy_if(具体来说the stencil version, I would think) is a reasonable starting point. The only other complexity I see seems to be the indexing "through" gpu_rand_list. This can be accomplished with a permutation iterator.

(旁白:当你想与 -1 进行精确比较时,使用模板数组 float 对我来说似乎有点奇怪,但也许它是有道理的。)

类似这样的东西可能对你有用:

$ cat t881.cu
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
#include <thrust/iterator/permutation_iterator.h>
#include <stdlib.h>
#include <vector>
#include <iostream>

using namespace thrust::placeholders;

int main(){

// Compute a random list on CPU
  int iterations = 500;
  int src_size = 2048;
  thrust::host_vector<int> cpu_rand_list (iterations * 4);

  for(size_t i = 0; i < cpu_rand_list.size(); ++i) {
    cpu_rand_list[i] = src_size * (rand()/(1.0 + RAND_MAX));
    }

// Copy the random list to GPU
  thrust::device_vector<int> gpu_rand_list = cpu_rand_list;
  thrust::device_vector<float> values(src_size, -1.0f);
// pick some values to copy
  values[2] = 0;  values[3] = 0; values[5] = 0;
  thrust::device_vector<int> result(iterations * 4);

  thrust::copy_if(gpu_rand_list.begin(), gpu_rand_list.end(),thrust::make_permutation_iterator(values.begin(), gpu_rand_list.begin()), result.begin(), _1 != -1.0f);
  std::vector<float> h_values(src_size);
  thrust::copy(values.begin(), values.end(), h_values.begin());
  thrust::host_vector<int> h_result = result;
  std::vector<int> refined;
  for (int i = 0; i < cpu_rand_list.size(); ++i) {
    if (h_values[cpu_rand_list[i]] != -1)
        refined.push_back(gpu_rand_list[i]);
    }
  for (int i = 0; i < refined.size(); i++) 
    if (refined[i] != h_result[i]) { std::cout << "mismatch at: " << i << "was: " << h_result[i] << "should be: " << refined[i] << std::endl; return 1;}
    else std::cout << refined[i] << std::endl;
  return 0;

}
$ nvcc -o t881 t881.cu
$ ./t881
2
5
5
$

(我正在使用 thrust placeholders,所以我不必为 copy_if 操作创建显式仿函数。)