在 GPU 上分批评估分段多项式

Batchwise evaluation of piecewise polynomials on GPU

我正在尝试评估从三次样条获得的大型分段多项式中的点。我正在尝试在 GPU 上执行此操作,但我 运行 遇到了内存限制。

因此,我想批量计算分段多项式。

原代码:

Y = some_matrix_of_data_values ;
X = some_vector_of_data_sites ;
pp = spline(X, Y) ; % get the piecewise polynomial form of the cubic spline. The resulting structure is very large.

for t = 1: big_number
    hcurrent = ppval(pp,t); %evaluate the piecewise polynomial at t
    y(t) = sum(x(t:t+M-1).*hcurrent,1) ; % do some operation of the interpolated value. Most likely not relevant to this question.
end

矢量化,希望在 GPU 批处理的道路上:

Y = some_matrix_of_data_values ;
X = some_vector_of_data_sites ;
pp = spline(X, Y) ; % get the piecewise polynomial form of the cubic spline. Resulting structure is very large.
batchSize = 1024 ;

for tt = 1: batchSize: big_number
    if tt > big_number - batchSize % snatch up any remaining values at the end of the loop, and calculate those as well
        batchSize = big_number - tt ;
    end            
    hcurrent =  ppval(pp ,(tt:tt+batchSize-1) ) ;  %evaluate pp at a couple of data sites     

    ind = bsxfun(@plus, 1:M, (tt-1:1:tt+batchSize-2).')) ; %make an index matrix to help with next calculation. Most likely not relevant to this question.
    y(tt:tt+batchSize-1) = sum( x(ind).*hcurrent' , 2 ) ; % do some calculation, but now we have done it in batches!
end

在修订后的代码中,分段多项式在多个数据站点进行评估,因此我们至少正在朝着这个方向发展。分段多项式 pp 太大,无法存储在 GPU 上,有没有办法将其分解以进行批处理?

有用的线程 ,它讨论的是分段多项式求值的并行化。该方案可以移植到GPU进行批处理。