通过像素转换将数据从 ReadOnlySpan 复制到输出的最快方法
Fastest way to copy data from ReadOnlySpan to output with pixel conversion
当我将数据从输入 (ReadOnlySpan) 复制到输出 (Span) 时遇到性能问题'for')
还有Span.CopyTo,很完美,速度很快
但是现在不转换像素就没用了。
下面是代码,我觉得有一些简短的方法可以代替当前的过程:
public unsafe void UpdateFromOutput(CanvasDevice device, ReadOnlySpan<byte> data, uint width, uint height, uint pitch)
{
using (var renderTargetMap = new BitmapMap(device, RenderTarget))
{
var inputPitch = (int)pitch;
var mapPitch = (int)renderTargetMap.PitchBytes;
var mapData = new Span<byte>(new IntPtr(renderTargetMap.Data).ToPointer(), (int)RenderTarget.Size.Height * mapPitch);
switch (CurrentPixelFormat)
{
case PixelFormats.RGB0555:
FramebufferConverter.ConvertFrameBufferRGB0555ToXRGB8888(width, height, data, inputPitch, mapData, mapPitch);
break;
case PixelFormats.RGB565:
FramebufferConverter.ConvertFrameBufferRGB565ToXRGB8888(width, height, data, inputPitch, mapData, mapPitch);
break;
}
}
}
然后内部函数像 ConvertFrameBufferRGB0555ToXRGB8888
我将像下面这样检查宽度和高度:
var castInput = MemoryMarshal.Cast<byte, ushort>(input);
var castInputPitch = inputPitch / sizeof(ushort);
var castOutput = MemoryMarshal.Cast<byte, uint>(output);
var castOutputPitch = outputPitch / sizeof(uint);
castOutput.Fill(0);
for (var i = 0; i < height;i++)
{
var inputLine = castInput.Slice(i * castInputPitch, castInputPitch);
var outputLine = castOutput.Slice(i * castOutputPitch, castOutputPitch);
for (var j = 0; j < width;j++)
{
outputLine[j] = ConverToRGB888(inputLine[j]);
}
}
上面的代码工作但在某些情况下很慢。
请注意:我正在修改一个项目,所以上面的代码是由原始开发人员编写的,我需要帮助,因为我不明白这个过程是如何工作的,仍然很困惑.. 特别是在 Slice 部分。
尝试 仅测试 将输入直接复制到输出 data.CopyTo(mapData);
我得到了这个(如预期的那样):
希望 Marshal 和 Span 函数有一些解决方案
非常感谢。
有关 (ConverToRGB888) 的更新
关于ConverToRGB888,原代码包含RGB565LookupTable:
private const uint LookupTableSize = ushort.MaxValue + 1;
private static uint[] RGB565LookupTable = new uint[LookupTableSize];
public static void SetRGB0565LookupTable()
{
uint r565, g565, b565;
double red = 255.0;
double green = 255.0;
double blue = 255.0;
for (uint i = 0; i < LookupTableSize; i++)
{
//RGB565
r565 = (i >> 11) & 0x1F;
g565 = (i >> 5) & 0x3F;
b565 = (i & 0x1F);
r565 = (uint)Math.Round(r565 * red / 31.0);
g565 = (uint)Math.Round(g565 * green / 63.0);
b565 = (uint)Math.Round(b565 * blue / 31.0);
RGB565LookupTable[i] = (0xFF000000 | r565 << 16 | g565 << 8 | b565);
}
}
private static uint ConverToRGB888(ushort x)
{
return RGB565LookupTable[x];
}
SetRGB0565LookupTable() 将仅调用一次以填充值。
结论:
Fill(0)
并不重要,它导致延迟
- 不安全版本(接受的答案)对我来说很清楚,而且速度更快
- 部分避免
For
甚至更快 Here [已测试]
- 预查找 table 非常有用,可以加快转换速度
- 内存助手,例如
Span.CopyTo
、Buffer.MemoryCopy
源可用 Here
- 在 UnsafeMemory
的帮助下,在某些情况下使用 Parallel.For
会更快
- 如果您使用 Win2D 支持的(像素类型)输入,则有可能避免像这样的循环:
byte[] dataBytes = new byte[data.Length];
fixed (byte* inputPointer = &data[0])
Marshal.Copy((IntPtr)inputPointer, dataBytes, 0, data.Length);
RenderTarget = CanvasBitmap.CreateFromBytes(renderPanel, dataBytes, (int)width, (int)height, DirectXPixelFormat.R8G8UIntNormalized, 92, CanvasAlphaMode.Ignore);
但不确定最后一点,因为我无法在 565,555 上进行测试。
感谢DekuDesu他提供的解释和简化版本帮助我做了更多测试。
I'm having performance issue when I copy the data from input (ReadOnlySpan) to output (Span) using (Loops like 'for')
您提供的代码已经非常安全,并且具有您将获得的逐像素操作的最佳复杂性。嵌套 for
循环的存在不一定对应于性能问题或增加的复杂性。
I need help because I don't understand how the process is working, still very confused.. specially in the Slice part.
这段代码看起来像是要将一种位图格式转换为另一种位图格式。位图有不同的大小和格式。因此,它们包含一条附加信息以及宽度和高度、间距。
间距是两行像素信息之间的距离字节,这用于说明不包含完整 32/64 位颜色信息的格式。
知道这一点后,我对有问题的方法进行了评论,以帮助解释它在做什么。
public static void ConvertFrameBufferRGB565ToXRGB8888(uint width, uint height, ReadOnlySpan<byte> input, int inputPitch, Span<byte> output, int outputPitch)
{
// convert the span of bytes into a span of ushorts
// so we can use span[i] to get a ushort
var castInput = MemoryMarshal.Cast<byte, ushort>(input);
// pitch is the number of bytes between the first byte of a line and the first byte of the next line
// convert the pitch from bytes into ushort pitch
var castInputPitch = inputPitch / sizeof(ushort);
// convert the span of bytes into a span of ushorts
// so we can use span[i] to get a ushort
var castOutput = MemoryMarshal.Cast<byte, uint>(output);
var castOutputPitch = outputPitch / sizeof(uint);
for (var i = 0; i < height; i++)
{
// get a line from the input
// remember that pitch is the number of ushorts between lines
// so i * pitch here gives us the index of the i'th line, and we don't need the padding
// ushorts at the end so we only take castInputPitch number of ushorts
var inputLine = castInput.Slice(i * castInputPitch, castInputPitch);
// same thing as above but for the output
var outputLine = castOutput.Slice(i * castOutputPitch, castOutputPitch);
for (var j = 0; j < width; j++)
{
// iterate through the line, converting each pixel and storing it in the output span
outputLine[j] = ConverToRGB888(inputLine[j]);
}
}
}
Fastest way to copy data from ReadOnlySpan to output with pixel conversion
老实说你提供的方法还不错,安全又快速。请记住,在 CPU 上线性复制位图等数据本质上是一个缓慢的过程。您可能希望的最大性能节省是避免冗余复制数据。除非这需要绝对超快的速度,否则我不会推荐除删除 .fill(0)
之外的更改,因为它可能没有必要,但你必须测试它。
如果您绝对 必须从中获得更多性能,您可能需要考虑 类似我在下面提供的内容。但是,我提醒您,像这样的不安全代码是很好的.. 不安全 并且容易出错。它几乎没有错误检查,并做了很多假设,所以这取决于你实现。
如果速度仍然不够快,请考虑用 C 编写一个 .dll 并使用互操作。
public static unsafe void ConvertExtremelyUnsafe(ulong height, ref byte inputArray, ulong inputLength, ulong inputPitch, ref byte outputArray, ulong outputLength, ulong outputPitch)
{
// pin down pointers so they dont move on the heap
fixed (byte* inputPointer = &inputArray, outputPointer = &outputArray)
{
// since we have to account for padding we should go line by line
for (ulong y = 0; y < height; y++)
{
// get a pointer for the first byte of the line of the input
byte* inputLinePointer = inputPointer + (y * inputPitch);
// get a pointer for the first byte of the line of the output
byte* outputLinePointer = outputPointer + (y * outputPitch);
// traverse the input line by ushorts
for (ulong i = 0; i < (inputPitch / sizeof(ushort)); i++)
{
// calculate the offset for the i'th ushort,
// becuase we loop based on the input and ushort we dont need an index check here
ulong inputOffset = i * sizeof(ushort);
// get a pointer to the i'th ushort
ushort* rgb565Pointer = (ushort*)(inputLinePointer + inputOffset);
ushort rgb565Value = *rgb565Pointer;
// convert the rgb to the other format
uint rgb888Value = ConverToRGB888(rgb565Value);
// calculate the offset for i'th uint
ulong outputOffset = i * sizeof(uint);
// at least attempt to avoid overflowing a buffer, not that the runtime would let you do that, i would hope..
if (outputOffset >= outputLength)
{
throw new IndexOutOfRangeException($"{nameof(outputArray)}[{outputOffset}]");
}
// get a pointer to the i'th uint
uint* rgb888Pointer = (uint*)(outputLinePointer + outputOffset);
// write the bytes of the rgb888 to the output array
*rgb888Pointer = rgb888Value;
}
}
}
}
免责声明:我是在手机上写的
当我将数据从输入 (ReadOnlySpan) 复制到输出 (Span) 时遇到性能问题'for')
还有Span.CopyTo,很完美,速度很快 但是现在不转换像素就没用了。
下面是代码,我觉得有一些简短的方法可以代替当前的过程:
public unsafe void UpdateFromOutput(CanvasDevice device, ReadOnlySpan<byte> data, uint width, uint height, uint pitch)
{
using (var renderTargetMap = new BitmapMap(device, RenderTarget))
{
var inputPitch = (int)pitch;
var mapPitch = (int)renderTargetMap.PitchBytes;
var mapData = new Span<byte>(new IntPtr(renderTargetMap.Data).ToPointer(), (int)RenderTarget.Size.Height * mapPitch);
switch (CurrentPixelFormat)
{
case PixelFormats.RGB0555:
FramebufferConverter.ConvertFrameBufferRGB0555ToXRGB8888(width, height, data, inputPitch, mapData, mapPitch);
break;
case PixelFormats.RGB565:
FramebufferConverter.ConvertFrameBufferRGB565ToXRGB8888(width, height, data, inputPitch, mapData, mapPitch);
break;
}
}
}
然后内部函数像 ConvertFrameBufferRGB0555ToXRGB8888
我将像下面这样检查宽度和高度:
var castInput = MemoryMarshal.Cast<byte, ushort>(input);
var castInputPitch = inputPitch / sizeof(ushort);
var castOutput = MemoryMarshal.Cast<byte, uint>(output);
var castOutputPitch = outputPitch / sizeof(uint);
castOutput.Fill(0);
for (var i = 0; i < height;i++)
{
var inputLine = castInput.Slice(i * castInputPitch, castInputPitch);
var outputLine = castOutput.Slice(i * castOutputPitch, castOutputPitch);
for (var j = 0; j < width;j++)
{
outputLine[j] = ConverToRGB888(inputLine[j]);
}
}
上面的代码工作但在某些情况下很慢。
请注意:我正在修改一个项目,所以上面的代码是由原始开发人员编写的,我需要帮助,因为我不明白这个过程是如何工作的,仍然很困惑.. 特别是在 Slice 部分。
尝试 仅测试 将输入直接复制到输出 data.CopyTo(mapData);
我得到了这个(如预期的那样):
希望 Marshal 和 Span 函数有一些解决方案
非常感谢。
有关 (ConverToRGB888) 的更新
关于ConverToRGB888,原代码包含RGB565LookupTable:
private const uint LookupTableSize = ushort.MaxValue + 1;
private static uint[] RGB565LookupTable = new uint[LookupTableSize];
public static void SetRGB0565LookupTable()
{
uint r565, g565, b565;
double red = 255.0;
double green = 255.0;
double blue = 255.0;
for (uint i = 0; i < LookupTableSize; i++)
{
//RGB565
r565 = (i >> 11) & 0x1F;
g565 = (i >> 5) & 0x3F;
b565 = (i & 0x1F);
r565 = (uint)Math.Round(r565 * red / 31.0);
g565 = (uint)Math.Round(g565 * green / 63.0);
b565 = (uint)Math.Round(b565 * blue / 31.0);
RGB565LookupTable[i] = (0xFF000000 | r565 << 16 | g565 << 8 | b565);
}
}
private static uint ConverToRGB888(ushort x)
{
return RGB565LookupTable[x];
}
SetRGB0565LookupTable() 将仅调用一次以填充值。
结论:
Fill(0)
并不重要,它导致延迟- 不安全版本(接受的答案)对我来说很清楚,而且速度更快
- 部分避免
For
甚至更快 Here [已测试] - 预查找 table 非常有用,可以加快转换速度
- 内存助手,例如
Span.CopyTo
、Buffer.MemoryCopy
源可用 Here - 在 UnsafeMemory 的帮助下,在某些情况下使用
- 如果您使用 Win2D 支持的(像素类型)输入,则有可能避免像这样的循环:
Parallel.For
会更快
byte[] dataBytes = new byte[data.Length];
fixed (byte* inputPointer = &data[0])
Marshal.Copy((IntPtr)inputPointer, dataBytes, 0, data.Length);
RenderTarget = CanvasBitmap.CreateFromBytes(renderPanel, dataBytes, (int)width, (int)height, DirectXPixelFormat.R8G8UIntNormalized, 92, CanvasAlphaMode.Ignore);
但不确定最后一点,因为我无法在 565,555 上进行测试。
感谢DekuDesu他提供的解释和简化版本帮助我做了更多测试。
I'm having performance issue when I copy the data from input (ReadOnlySpan) to output (Span) using (Loops like 'for')
您提供的代码已经非常安全,并且具有您将获得的逐像素操作的最佳复杂性。嵌套 for
循环的存在不一定对应于性能问题或增加的复杂性。
I need help because I don't understand how the process is working, still very confused.. specially in the Slice part.
这段代码看起来像是要将一种位图格式转换为另一种位图格式。位图有不同的大小和格式。因此,它们包含一条附加信息以及宽度和高度、间距。
间距是两行像素信息之间的距离字节,这用于说明不包含完整 32/64 位颜色信息的格式。
知道这一点后,我对有问题的方法进行了评论,以帮助解释它在做什么。
public static void ConvertFrameBufferRGB565ToXRGB8888(uint width, uint height, ReadOnlySpan<byte> input, int inputPitch, Span<byte> output, int outputPitch)
{
// convert the span of bytes into a span of ushorts
// so we can use span[i] to get a ushort
var castInput = MemoryMarshal.Cast<byte, ushort>(input);
// pitch is the number of bytes between the first byte of a line and the first byte of the next line
// convert the pitch from bytes into ushort pitch
var castInputPitch = inputPitch / sizeof(ushort);
// convert the span of bytes into a span of ushorts
// so we can use span[i] to get a ushort
var castOutput = MemoryMarshal.Cast<byte, uint>(output);
var castOutputPitch = outputPitch / sizeof(uint);
for (var i = 0; i < height; i++)
{
// get a line from the input
// remember that pitch is the number of ushorts between lines
// so i * pitch here gives us the index of the i'th line, and we don't need the padding
// ushorts at the end so we only take castInputPitch number of ushorts
var inputLine = castInput.Slice(i * castInputPitch, castInputPitch);
// same thing as above but for the output
var outputLine = castOutput.Slice(i * castOutputPitch, castOutputPitch);
for (var j = 0; j < width; j++)
{
// iterate through the line, converting each pixel and storing it in the output span
outputLine[j] = ConverToRGB888(inputLine[j]);
}
}
}
Fastest way to copy data from ReadOnlySpan to output with pixel conversion
老实说你提供的方法还不错,安全又快速。请记住,在 CPU 上线性复制位图等数据本质上是一个缓慢的过程。您可能希望的最大性能节省是避免冗余复制数据。除非这需要绝对超快的速度,否则我不会推荐除删除 .fill(0)
之外的更改,因为它可能没有必要,但你必须测试它。
如果您绝对 必须从中获得更多性能,您可能需要考虑 类似我在下面提供的内容。但是,我提醒您,像这样的不安全代码是很好的.. 不安全 并且容易出错。它几乎没有错误检查,并做了很多假设,所以这取决于你实现。
如果速度仍然不够快,请考虑用 C 编写一个 .dll 并使用互操作。
public static unsafe void ConvertExtremelyUnsafe(ulong height, ref byte inputArray, ulong inputLength, ulong inputPitch, ref byte outputArray, ulong outputLength, ulong outputPitch)
{
// pin down pointers so they dont move on the heap
fixed (byte* inputPointer = &inputArray, outputPointer = &outputArray)
{
// since we have to account for padding we should go line by line
for (ulong y = 0; y < height; y++)
{
// get a pointer for the first byte of the line of the input
byte* inputLinePointer = inputPointer + (y * inputPitch);
// get a pointer for the first byte of the line of the output
byte* outputLinePointer = outputPointer + (y * outputPitch);
// traverse the input line by ushorts
for (ulong i = 0; i < (inputPitch / sizeof(ushort)); i++)
{
// calculate the offset for the i'th ushort,
// becuase we loop based on the input and ushort we dont need an index check here
ulong inputOffset = i * sizeof(ushort);
// get a pointer to the i'th ushort
ushort* rgb565Pointer = (ushort*)(inputLinePointer + inputOffset);
ushort rgb565Value = *rgb565Pointer;
// convert the rgb to the other format
uint rgb888Value = ConverToRGB888(rgb565Value);
// calculate the offset for i'th uint
ulong outputOffset = i * sizeof(uint);
// at least attempt to avoid overflowing a buffer, not that the runtime would let you do that, i would hope..
if (outputOffset >= outputLength)
{
throw new IndexOutOfRangeException($"{nameof(outputArray)}[{outputOffset}]");
}
// get a pointer to the i'th uint
uint* rgb888Pointer = (uint*)(outputLinePointer + outputOffset);
// write the bytes of the rgb888 to the output array
*rgb888Pointer = rgb888Value;
}
}
}
}
免责声明:我是在手机上写的