thrust::reduce_by_key() 似乎不起作用

Question

这是我的代码：

//initialize the device_vector
int size = N;
thrust::device_vector<glm::vec3> value(size);
thrust::device_vector<int> key(size);
//get the device pointer of the device_vector
//so than I can write data to the device_vector in CUDA kernel
glm::vec3 * dv_value_ptr = thrust::raw_pointer_cast(&value[0]);
int* dv_key_ptr = thrust::raw_pointer_cast(&key[0]);
//run the kernel function
dim3 threads(16, 16);
dim3 blocks(iDivUp(m_width, threads.x), iDivUp(m_height, threads.y));
//the size of value and key is packed in dev_data
compute_one_i_all_j <<<blocks, threads >>>(dev_data, dv_key_ptr, dv_value_ptr);
//Finally, reduce the vector by its keys.
thrust::pair<thrust::device_vector<int>::iterator,
      thrust::device_vector<glm::vec3>::iterator> new_last;
new_last = thrust::reduce_by_key(key.begin(), key.end(), value.begin(), output_key.begin(), output_value.begin());
//get the reduced vector size
int new__size = new_last.first - output_key.begin();

完成所有这些代码后，我将 output_key 写入文件。我在文件中得到了很多重复的键，如下所示：

所以，reduce_by_key() 似乎不起作用。 Ps。 CUDA内核只写了key和value的一部分，所以在内核之后key和value中的一些元素保持不变（可能是0）。

Answer 1

如文档中所述：

For each group of consecutive keys in the range [keys_first, keys_last) that are equal, reduce_by_key copies the first element of the group to the keys_output. The corresponding values in the range are reduced using the plus and the result copied to values_output.

每组相等的连续个键会被减去

因此，首先您必须重新排列所有键和值，以便具有相同键的所有元素相邻。最简单的方法是使用 sort_by_key.

thrust::sort_by_key(key.begin(), key.end(), value.begin())
new_last = thrust::reduce_by_key(key.begin(), key.end(), value.begin(), output_key.begin(), output_value.begin());

thrust::reduce_by_key() 似乎不起作用

thrust::reduce_by_key() seems not work

cuda

thrust