通过像素转换将数据从 ReadOnlySpan 复制到输出的最快方法

Question

当我将数据从输入 (ReadOnlySpan) 复制到输出 (Span) 时遇到性能问题'for')

还有Span.CopyTo，很完美，速度很快但是现在不转换像素就没用了。

下面是代码，我觉得有一些简短的方法可以代替当前的过程：

public unsafe void UpdateFromOutput(CanvasDevice device, ReadOnlySpan<byte> data, uint width, uint height, uint pitch)
{
  using (var renderTargetMap = new BitmapMap(device, RenderTarget))
  {
   var inputPitch = (int)pitch;
   var mapPitch = (int)renderTargetMap.PitchBytes;
                       
   var mapData = new Span<byte>(new  IntPtr(renderTargetMap.Data).ToPointer(), (int)RenderTarget.Size.Height * mapPitch);

   switch (CurrentPixelFormat)
   {
    case PixelFormats.RGB0555:
      FramebufferConverter.ConvertFrameBufferRGB0555ToXRGB8888(width, height, data, inputPitch, mapData, mapPitch);
    break;

    case PixelFormats.RGB565:
      FramebufferConverter.ConvertFrameBufferRGB565ToXRGB8888(width, height, data, inputPitch, mapData, mapPitch);
    break; 
   }

  }
}

然后内部函数像 ConvertFrameBufferRGB0555ToXRGB8888

我将像下面这样检查宽度和高度：

var castInput = MemoryMarshal.Cast<byte, ushort>(input);
var castInputPitch = inputPitch / sizeof(ushort);
var castOutput = MemoryMarshal.Cast<byte, uint>(output);
var castOutputPitch = outputPitch / sizeof(uint);
castOutput.Fill(0);

 for (var i = 0; i < height;i++)
 {
   var inputLine = castInput.Slice(i * castInputPitch, castInputPitch);
   var outputLine = castOutput.Slice(i * castOutputPitch, castOutputPitch);

    for (var j = 0; j < width;j++)
    {
     outputLine[j] = ConverToRGB888(inputLine[j]);
    }
 }

上面的代码工作但在某些情况下很慢。

请注意：我正在修改一个项目，所以上面的代码是由原始开发人员编写的，我需要帮助，因为我不明白这个过程是如何工作的，仍然很困惑.. 特别是在 Slice 部分。

尝试 仅测试 将输入直接复制到输出 data.CopyTo(mapData); 我得到了这个（如预期的那样）：

希望 Marshal 和 Span 函数有一些解决方案

非常感谢。

有关 (ConverToRGB888) 的更新

关于ConverToRGB888，原代码包含RGB565LookupTable:

private const uint LookupTableSize = ushort.MaxValue + 1;
private static uint[] RGB565LookupTable = new uint[LookupTableSize];

public static void SetRGB0565LookupTable()
{
  uint r565, g565, b565;

  double red = 255.0;
  double green = 255.0;
  double blue = 255.0;

  for (uint i = 0; i < LookupTableSize; i++)
  {
     //RGB565
     r565 = (i >> 11) & 0x1F;
     g565 = (i >> 5) & 0x3F;
     b565 = (i & 0x1F);

     r565 = (uint)Math.Round(r565 * red / 31.0);
     g565 = (uint)Math.Round(g565 * green / 63.0);
     b565 = (uint)Math.Round(b565 * blue / 31.0);

     RGB565LookupTable[i] = (0xFF000000 | r565 << 16 | g565 << 8 | b565);
   }
}

private static uint ConverToRGB888(ushort x)
{
  return RGB565LookupTable[x];
}

SetRGB0565LookupTable() 将仅调用一次以填充值。

结论：

Fill(0) 并不重要，它导致延迟
不安全版本（接受的答案）对我来说很清楚，而且速度更快
部分避免 For 甚至更快 Here [已测试]
预查找 table 非常有用，可以加快转换速度
内存助手，例如 Span.CopyTo、Buffer.MemoryCopy 源可用 Here
在 UnsafeMemory

Parallel.For

如果您使用 Win2D 支持的（像素类型）输入，则有可能避免像这样的循环：

byte[] dataBytes = new byte[data.Length];
fixed (byte* inputPointer = &data[0])
Marshal.Copy((IntPtr)inputPointer, dataBytes, 0, data.Length);
RenderTarget = CanvasBitmap.CreateFromBytes(renderPanel, dataBytes, (int)width, (int)height, DirectXPixelFormat.R8G8UIntNormalized, 92, CanvasAlphaMode.Ignore);

但不确定最后一点，因为我无法在 565,555 上进行测试。

感谢DekuDesu他提供的解释和简化版本帮助我做了更多测试。

Answer 1

I'm having performance issue when I copy the data from input (ReadOnlySpan) to output (Span) using (Loops like 'for')

您提供的代码已经非常安全，并且具有您将获得的逐像素操作的最佳复杂性。嵌套 for 循环的存在不一定对应于性能问题或增加的复杂性。

I need help because I don't understand how the process is working, still very confused.. specially in the Slice part.

这段代码看起来像是要将一种位图格式转换为另一种位图格式。位图有不同的大小和格式。因此，它们包含一条附加信息以及宽度和高度、间距。

间距是两行像素信息之间的距离字节，这用于说明不包含完整 32/64 位颜色信息的格式。

知道这一点后，我对有问题的方法进行了评论，以帮助解释它在做什么。

public static void ConvertFrameBufferRGB565ToXRGB8888(uint width, uint height, ReadOnlySpan<byte> input, int inputPitch, Span<byte> output, int outputPitch)
{
    // convert the span of bytes into a span of ushorts
    // so we can use span[i] to get a ushort
    var castInput = MemoryMarshal.Cast<byte, ushort>(input);

    // pitch is the number of bytes between the first byte of a line and the first byte of the next line
    // convert the pitch from bytes into ushort pitch
    var castInputPitch = inputPitch / sizeof(ushort);

    // convert the span of bytes into a span of ushorts
    // so we can use span[i] to get a ushort
    var castOutput = MemoryMarshal.Cast<byte, uint>(output);
    var castOutputPitch = outputPitch / sizeof(uint);

    for (var i = 0; i < height; i++)
    {
        // get a line from the input
        // remember that pitch is the number of ushorts between lines
        // so i * pitch here gives us the index of the i'th line, and we don't need the padding
        // ushorts at the end so we only take castInputPitch number of ushorts
        var inputLine = castInput.Slice(i * castInputPitch, castInputPitch);
                
        // same thing as above but for the output
        var outputLine = castOutput.Slice(i * castOutputPitch, castOutputPitch);

        for (var j = 0; j < width; j++)
        {
            // iterate through the line, converting each pixel and storing it in the output span
            outputLine[j] = ConverToRGB888(inputLine[j]);
        }
    }
}

Fastest way to copy data from ReadOnlySpan to output with pixel conversion

老实说你提供的方法还不错，安全又快速。请记住，在 CPU 上线性复制位图等数据本质上是一个缓慢的过程。您可能希望的最大性能节省是避免冗余复制数据。除非这需要绝对超快的速度，否则我不会推荐除删除 .fill(0) 之外的更改，因为它可能没有必要，但你必须测试它。

如果您绝对必须从中获得更多性能，您可能需要考虑类似我在下面提供的内容。但是，我提醒您，像这样的不安全代码是很好的.. 不安全 并且容易出错。它几乎没有错误检查，并做了很多假设，所以这取决于你实现。

如果速度仍然不够快，请考虑用 C 编写一个 .dll 并使用互操作。

public static unsafe void ConvertExtremelyUnsafe(ulong height, ref byte inputArray, ulong inputLength, ulong inputPitch, ref byte outputArray, ulong outputLength, ulong outputPitch)
{
    // pin down pointers so they dont move on the heap
    fixed (byte* inputPointer = &inputArray, outputPointer = &outputArray)
    {
        // since we have to account for padding we should go line by line
        for (ulong y = 0; y < height; y++)
        {
            // get a pointer for the first byte of the line of the input
            byte* inputLinePointer = inputPointer + (y * inputPitch);

            // get a pointer for the first byte of the line of the output
            byte* outputLinePointer = outputPointer + (y * outputPitch);

            // traverse the input line by ushorts
            for (ulong i = 0; i < (inputPitch / sizeof(ushort)); i++)
            {
                // calculate the offset for the i'th ushort,
                // becuase we loop based on the input and ushort we dont need an index check here
                ulong inputOffset = i * sizeof(ushort);

                // get a pointer to the i'th ushort
                ushort* rgb565Pointer = (ushort*)(inputLinePointer + inputOffset);

                ushort rgb565Value = *rgb565Pointer;

                // convert the rgb to the other format
                uint rgb888Value = ConverToRGB888(rgb565Value);

                // calculate the offset for i'th uint
                ulong outputOffset = i * sizeof(uint);

                // at least attempt to avoid overflowing a buffer, not that the runtime would let you do that, i would hope..
                if (outputOffset >= outputLength)
                {
                    throw new IndexOutOfRangeException($"{nameof(outputArray)}[{outputOffset}]");
                }

                // get a pointer to the i'th uint
                uint* rgb888Pointer = (uint*)(outputLinePointer + outputOffset);

                // write the bytes of the rgb888 to the output array
                *rgb888Pointer = rgb888Value;
            }
        }   
    }
}

免责声明：我是在手机上写的

通过像素转换将数据从 ReadOnlySpan 复制到输出的最快方法

Fastest way to copy data from ReadOnlySpan to output with pixel conversion

c#

memory-management

uwp

结论：