调整单个 1 像素宽位图条的大小 - 比这个例子更快？（对于 Raycaster 算法）

Question

我附上图片示例和我当前的代码。

我的问题是：我能否使 resizing/streching/interpolating 单个垂直位图条带更快使用另一个 for 循环。

当前代码看起来非常优化：

对于屏幕中的当前条带大小，从开始高度迭代到结束高度。得到相应的从纹理中提取像素并添加到输出缓冲区。添加步骤以获得另一个像素。

这是我的代码的重要部分：

inline void RC_Raycast_Walls()
{
  // casting ray for every width pixel
    for (u_int16 rx = 0; rx < RC_render_width_i; ++rx)
    {
        // ..
        // traversing thru map of grid
        // finding intersecting point
        // calculating height of strip in screen
        // ..


        // step size for nex pixel in texutr
        float32 tex_step_y = RC_texture_size_f / (float32)pp_wall_height;

        // starting texture coordinate
        float32 tex_y = (float32)(pp_wall_start - RC_player_pitch - player_z_div_wall_distance - RC_render_height_d2_i + pp_wall_height_d2) * tex_step_y;

        // drawing walls into buffer <- ENTERING ANOTHER LOOP only for SINGLE STRIP
        for (int16 ry = pp_wall_start; ry < pp_wall_end; ++ry)
        {
            // cast the texture coordinate to integer, and mask with (texHeight - 1) in case of overflow
            u_int16 tex_y_safe = (u_int16)tex_y & RC_texture_size_m1_i;
            tex_y += tex_step_y;

            u_int32 texture_current_pixel = texture_pixels[RC_texture_size_i * tex_y_safe + tex_x];
            u_int32 output_pixel_index = rx + ry * RC_render_width_i;

            output_buffer[output_pixel_index] =
                                                (((texture_current_pixel >> 16 & 0x0ff) * intensity_value) >> 8) << 16 |
                                                (((texture_current_pixel >> 8 & 0x0ff) * intensity_value) >> 8) << 8 |
                                                (((texture_current_pixel & 0x0ff) * intensity_value) >> 8);
        }
    }
}

也许一些更大的步进，比如 2 而不是 1，然后每隔一行就空了，但是添加另一行代码可以填充空 space 结果相同的性能.. 我不想让像素加倍并在其中两个之间进行插值我认为甚至需要更长。 ??

提前致谢！

ps。它基于 Lodev Raycaster 算法： https://lodev.org/cgtutor/raycasting.html

Answer 1

你根本不需要float

您可以对整数使用 DDA，无需乘法和除法。现在 floating 不像以前那么慢了，但是你在 float 和 int 之间的转换可能是......看到这些 QA（都使用这种 DDA：
- DDA line with subpixel
使用 LUT 应用强度

看起来每个颜色通道 c 是 8 位，强度 i 是范围 <0,1> 中的固定点，因此您可以将每个组合预先计算成这样：
```
u_int8 LUT[256][256]
for (int c=0;c<256;c++)
 for (int i=0;i<256;i++)
  LUT[c][i]=((c*i)>>8)
```

使用指针或联合访问RGB通道而不是位操作

我最喜欢的是union：

union color
   {
   u_int32 dd;    // 1x 32bit RGBA
   u_int16 dw[2]; // 2x 16bit
   u_int8 db[4];  // 4x 8bit (individual channels)
   };

纹理坐标

看来您的操作太多了。例如 [RC_texture_size_i * tex_y_safe + tex_x] 如果你的纹理大小是 128 你可以将 lef 位移 7 位而不是乘法。是的，在现代 CPU 上这不是问题，但是整个事情可以用简单的 LUT 代替。您可以记住指向纹理的每个水平扫描线的指针并重写为 [tex_y_safe][tex_x]

所以基于 #2,#3 将你的颜色计算重写为：

color c;
c.dd=texture_current_pixel;
c.db[0]=LUT[c.db[0]][intensity_value];
c.db[1]=LUT[c.db[1]][intensity_value];
c.db[2]=LUT[c.db[2]][intensity_value];
output_buffer[output_pixel_index]=c.dd;

如您所见，它只是一堆内存传输，而不是多个位移位、位掩码和位或操作。您也可以使用 color 的指针代替 texture_current_pixel 和 output_buffer[output_pixel_index] 来加快速度。

最后看到这个：

哪个是我使用 VCL 的光线投射版本。

现在，在更改任何内容之前，通过测量渲染所需的时间来衡量您现在获得的性能。然后在每次更改代码后衡量它是否真的提高了性能。如果它没有使用旧版本的代码，因为有时很难预测当今平台上的速度。

另外，对于调整大小，使用 mipmaps 可以获得更好的视觉效果... 通常可以消除移动时的奇怪噪音

调整单个 1 像素宽位图条的大小 - 比这个例子更快？（对于 Raycaster 算法）

Resizing single 1 pixel wide bitmap strip - faster than this example? (for Raycaster algorithm)

arrays

3d

graphics

bitmap

raycasting

调整单个 1 像素宽位图条的大小 - 比这个例子更快？ （对于 Raycaster 算法）

Resizing single 1 pixel wide bitmap strip - faster than this example? (for Raycaster algorithm)

arrays

3d

graphics

bitmap

raycasting

调整单个 1 像素宽位图条的大小 - 比这个例子更快？（对于 Raycaster 算法）