双边网格生成器 class 使用增强型生成器

Bilateral Grid Generator class using Enhanced Generator

我正在尝试使用增强型生成器 class(例如使用 schedule()generate() 重新实现双边网格示例。 但是我在尝试编译代码时遇到了错误。

g++ -std=c++11 -I ../../include/ -I ../../tools/ -I ../../apps/support/ -g -  fno-rtti bilateral_grid_generator.cpp ../../lib/libHalide.a ../../tools/GenGen.cpp -o bin/bilateral_grid_exec  -ldl -lpthread -lz
bin/bilateral_grid_exec -o ./bin  target=host 
Generator bilateral_grid has base_path ./bin/bilateral_grid
Internal error at /home/xxx/Projects/Halide/src/Generator.cpp:966 triggered by user code at /usr/include/c++/4.8/functional:2057:
Condition failed: generator
make: *** [bin/bilateral_grid.a] Aborted (core dumped)

看来我没有把RDomGeneratorParam的定义放在正确的地方。由于r.xr.yschedule()generate()中都使用了,我想我应该把它作为class成员。应该怎么做才能解决这个问题?

这是我写的代码。

class BilateralGrid : public Halide::Generator<BilateralGrid> {
public:
GeneratorParam<int>   s_sigma{"s_sigma", 8};

//ImageParam            input{Float(32), 2, "input"};
//Param<float>          r_sigma{"r_sigma"};

Input<Buffer<float>>  input{"input", 2};
Input<float>          r_sigma{"r_sigma"};

Output<Buffer<float>> output{"output", 2};

// Algorithm Description
void generate() {
    //int s_sigma = 8;
    // Add a boundary condition
    clamped(x,y) = BoundaryConditions::repeat_edge(input)(x,y);

    // Construct the bilateral grid
    Expr val = clamped(x * s_sigma + r.x - s_sigma/2, y * s_sigma + r.y - s_sigma/2);
    val = clamp(val, 0.0f, 1.0f);

    Expr zi = cast<int>(val * (1.0f/r_sigma) + 0.5f);

    // Histogram
    histogram(x, y, z, c) = 0.0f;
    histogram(x, y, zi, c) += select(c == 0, val, 1.0f);

    // Blur the grid using a five-tap filter
    blurz(x, y, z, c) = (histogram(x, y, z-2, c) +
                         histogram(x, y, z-1, c)*4 +
                         histogram(x, y, z  , c)*6 +
                         histogram(x, y, z+1, c)*4 +
                         histogram(x, y, z+2, c));
    blurx(x, y, z, c) = (blurz(x-2, y, z, c) +
                         blurz(x-1, y, z, c)*4 +
                         blurz(x  , y, z, c)*6 +
                         blurz(x+1, y, z, c)*4 +
                         blurz(x+2, y, z, c));
    blury(x, y, z, c) = (blurx(x, y-2, z, c) +
                         blurx(x, y-1, z, c)*4 +
                         blurx(x, y  , z, c)*6 +
                         blurx(x, y+1, z, c)*4 +
                         blurx(x, y+2, z, c));

    // Take trilinear samples to compute the output
    val     = clamp(input(x, y), 0.0f, 1.0f);
    Expr zv = val * (1.0f/r_sigma);
    zi      = cast<int>(zv);
    Expr zf = zv - zi;
    Expr xf = cast<float>(x % s_sigma) / s_sigma;
    Expr yf = cast<float>(y % s_sigma) / s_sigma;
    Expr xi = x/s_sigma;
    Expr yi = y/s_sigma;

    interpolated(x, y, c) =
        lerp(lerp(lerp(blury(xi, yi, zi, c), blury(xi+1, yi, zi, c), xf),
                  lerp(blury(xi, yi+1, zi, c), blury(xi+1, yi+1, zi, c), xf), yf),
             lerp(lerp(blury(xi, yi, zi+1, c), blury(xi+1, yi, zi+1, c), xf),
                  lerp(blury(xi, yi+1, zi+1, c), blury(xi+1, yi+1, zi+1, c), xf), yf), zf);

    // Normalize and return the output.
    bilateral_grid(x, y) = interpolated(x, y, 0)/interpolated(x, y, 1);
    output(x,y)          = bilateral_grid(x,y);

}

// Scheduling
void schedule() { 
    // int s_sigma = 8;
    if (get_target().has_gpu_feature()) {
        // The GPU schedule
        Var xi{"xi"}, yi{"yi"}, zi{"zi"};

        // Schedule blurz in 8x8 tiles. This is a tile in
        // grid-space, which means it represents something like
        // 64x64 pixels in the input (if s_sigma is 8).
        blurz.compute_root().reorder(c, z, x, y).gpu_tile(x, y, xi, yi, 8, 8);

        // Schedule histogram to happen per-tile of blurz, with
        // intermediate results in shared memory. This means histogram
        // and blurz makes a three-stage kernel:
        // 1) Zero out the 8x8 set of histograms
        // 2) Compute those histogram by iterating over lots of the input image
        // 3) Blur the set of histograms in z
        histogram.reorder(c, z, x, y).compute_at(blurz, x).gpu_threads(x, y);
        histogram.update().reorder(c, r.x, r.y, x, y).gpu_threads(x, y).unroll(c);

        // An alternative schedule for histogram that doesn't use shared memory:
        // histogram.compute_root().reorder(c, z, x, y).gpu_tile(x, y, xi, yi, 8, 8);
        // histogram.update().reorder(c, r.x, r.y, x, y).gpu_tile(x, y, xi, yi, 8, 8).unroll(c);

        // Schedule the remaining blurs and the sampling at the end similarly.
        blurx.compute_root().gpu_tile(x, y, z, xi, yi, zi, 8, 8, 1);
        blury.compute_root().gpu_tile(x, y, z, xi, yi, zi, 8, 8, 1);
        bilateral_grid.compute_root().gpu_tile(x, y, xi, yi, s_sigma, s_sigma);
    } else {
        // The CPU schedule.
        blurz.compute_root().reorder(c, z, x, y).parallel(y).vectorize(x, 8).unroll(c);
        histogram.compute_at(blurz, y);
        histogram.update().reorder(c, r.x, r.y, x, y).unroll(c);
        blurx.compute_root().reorder(c, x, y, z).parallel(z).vectorize(x, 8).unroll(c);
        blury.compute_root().reorder(c, x, y, z).parallel(z).vectorize(x, 8).unroll(c);
        bilateral_grid.compute_root().parallel(y).vectorize(x, 8);
    }
}

Func clamped{"clamped"}, histogram{"histogram"};
Func bilateral_grid{"bilateral_grid"};
Func blurx{"blurx"}, blury{"blury"}, blurz{"blurz"}, interpolated{"interpolated"};
Var x{"x"}, y{"y"}, z{"z"}, c{"c"};
RDom r{0, s_sigma, 0, s_sigma};

};

//Halide::RegisterGenerator<BilateralGrid> register_me{"bilateral_grid"};
HALIDE_REGISTER_GENERATOR(BilateralGrid, "bilateral_grid");

}  // namespace

这里的错误很微妙,很遗憾,当前的断言失败消息没有帮助。

这里的问题是这段代码使用了一个GeneratorParam(s_sigma)来初始化一个成员变量-RDom(r),但是GeneratorParam那时可能还没有设置最终值。一般来说,在调用generate()方法之前访问一个GeneratorParam(或ScheduleParam)会产生这样的断言。

这是为什么?让我们看看典型构建系统中生成器的创建和初始化方式:

  1. GenGen.cpp 创建生成器的 C++ 实例 class;自然地,这会按照声明的顺序执行其 C++ 构造函数及其所有成员变量的 C++ 构造函数。
  2. GenGen.cpp 使用命令行提供的参数覆盖 GeneratorParams 的默认值。例如,如果您使用 bin/bilateral_grid_exec -o ./bin target=host s_sigma=7 调用了生成器,则存储在 s_sigma 中的默认值 (8) 将替换为 7.
  3. GenGen.cpp调用generate(),然后schedule(),然后将结果编译成.o(或.a等)。

那么您为什么会看到断言?这段代码中发生的事情是,在上面的第 1 步中,r 的构造函数在第 1 步中是 运行...但是 r 的构造函数的参数读取了当前值s_sigma,它有默认值 (8),但 不一定是构建文件指定的值 。如果我们允许这种读取在没有断言的情况下发生,您可能会在生成器的不同部分获得不一致的 s_sigma 值。

您可以通过将 RDom 的初始化推迟到 generate() 方法来解决此问题:

class BilateralGrid : public Halide::Generator<BilateralGrid> { public: GeneratorParam<int> s_sigma{"s_sigma", 8}; ... void generate() { r = RDom(0, s_sigma, 0, s_sigma); ... } ... private: RDom r; };

(显然,断言失败需要更有帮助的错误消息;我将修改代码以实现此目的。)