在 CUDA 中使用集合交集保留重复项

Question

我正在使用 CUDA 和 THRUST 执行成对的集合运算。不过，我想保留重复项。例如：

int keys[6] = {1, 1, 1, 3, 4, 5, 5};
int vals[6] = {1, 2, 3, 4, 5, 6, 7};
int comp[2] = {1, 5};

thrust::set_intersection_by_key(keys, keys + 6, comp, comp + 2, vals, rk, rv);

想要的结果

rk[1, 1, 1, 5, 5]
rv[1, 2, 3, 6, 7]

实际结果

rk[1, 5]
rv[5, 7]

我想要所有 vals，其中相应的 key 包含在 comp 中。

有什么方法可以使用 thrust 实现这个，还是我必须编写自己的内核或 thrust 函数？

我正在使用这个功能：set_intersection_by_key。

Answer 1

引自thrust documentation：

The generalization is that if an element appears m times in [keys_first1, keys_last1) and n times in [keys_first2, keys_last2) (where m may be zero), then it appears min(m,n) times in the keys output range

由于 comp 只包含每个键一次，因此 n=1 因此 min(m,1) = 1.

为了得到"all of the vals where the corresponding key is contained in comp"，可以使用my answer to a similar problem的方法。

同样，示例代码执行以下步骤：

获取d_comp的最大元素。这假设 d_comp 已经排序。
创建大小为 largest_element+1 的向量 d_map。将1复制到d_map中d_comp条目的所有位置。

将 d_vals 中 d_map 中有 1 条目的所有条目复制到 d_result 中。

#include <thrust/device_vector.h>
#include <thrust/iterator/constant_iterator.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/functional.h>
#include <thrust/copy.h>
#include <thrust/scatter.h>
#include <iostream>


#define PRINTER(name) print(#name, (name))
void print(const char* name, const thrust::device_vector<int>& v)
{
    std::cout << name << ":\t";
    thrust::copy(v.begin(), v.end(), std::ostream_iterator<int>(std::cout, "\t"));
    std::cout << std::endl;
}

int main()
{
    int keys[] = {1, 1, 1, 3, 4, 5, 5};
    int vals[] = {1, 2, 3, 4, 5, 6, 7};
    int comp[] = {1, 5};

    const int size_data = sizeof(keys)/sizeof(keys[0]);
    const int size_comp = sizeof(comp)/sizeof(comp[0]);

    // copy data to GPU
    thrust::device_vector<int> d_keys (keys, keys+size_data);
    thrust::device_vector<int> d_vals (vals, vals+size_data);
    thrust::device_vector<int> d_comp (comp, comp+size_comp);

    PRINTER(d_keys);
    PRINTER(d_vals);
    PRINTER(d_comp);

    int largest_element = d_comp.back();

    thrust::device_vector<int> d_map(largest_element+1);

    thrust::constant_iterator<int> one(1);
    thrust::scatter(one, one+size_comp, d_comp.begin(), d_map.begin());
    PRINTER(d_map);

    thrust::device_vector<int> d_result(size_data);
    using namespace thrust::placeholders;
    int final_size = thrust::copy_if(d_vals.begin(),
                                    d_vals.end(),
                                    thrust::make_permutation_iterator(d_map.begin(), d_keys.begin()),
                                    d_result.begin(),
                                    _1
                                    ) - d_result.begin();
    d_result.resize(final_size);

    PRINTER(d_result);

    return 0;
}

输出:

d_keys:     1   1   1   3   4   5   5   
d_vals:     1   2   3   4   5   6   7   
d_comp:     1   5   
d_map:      0   1   0   0   0   1   
d_result:   1   2   3   6   7

在 CUDA 中使用集合交集保留重复项

Retain Duplicates with Set Intersection in CUDA

cuda

thrust