在 CUDA 中的大型数组转换期间删除项目

Removing items during a large array transform in CUDA

给定一个大数组 A,这些值被转换为数组 B,所以 B = Transform(A)。其中A和B是不同的类型,转换Transform()是相当昂贵的,B的数据量比A大。但是还要根据一个谓词Keep(B)过滤掉结果。

有没有一种不先写出 B 数组然后修剪要保留的 B 条目的合适方法来做到这一点?

我开始努力尝试:

typedef int A;
struct B { int a, b, c; };


struct FTransform : thrust::unary_function<A, B>
{
    __device__ B operator()(A a) const { return B{ a, a, a }; }
};

struct FKeep : thrust::unary_function<B, bool>
{
    __device__ bool operator()(B b) const { return (b.a & 1) == 0; }
};


thrust::device_vector<B> outputs(8);
thrust::device_vector<A> inputs(8);

std::generate(inputs.begin(), inputs.end(), rand);

auto first = thrust::make_transform_iterator(inputs.begin(), FTransform());
auto last = thrust::make_transform_iterator(inputs.end(), FTransform());

auto end = thrust::copy_if(first, last, outputs, FKeep());

但是这会产生编译错误(Cuda 9.2):

thrust/iterator/iterator_traits.h(49): error : class "thrust::device_vector<B, thrust::device_malloc_allocator<B>>" has no member "iterator_category"

thrust/detail/copy_if.inl(78): error : incomplete type is not allowed

thrust/detail/copy_if.inl(80): error : no instance of overloaded function "select_system" matches the argument list

thrust/detail/copy_if.inl(80): error : no instance of overloaded function "thrust::copy_if" matches the argument list

这里:

auto end = thrust::copy_if(first, last, outputs, FKeep());
                                        ^^^^^^^

outputs 不是迭代器。你应该在那里传递 outputs.begin()

有了这个改变,你的代码就可以为我编译了。