如何在 neon 中执行 8 路去交织
How to perform a 8-way de-interleave in neon
在neon intrinsics中,有4个intrinsics(vld1 vld2 vld3 vld4)进行1-way to 4-way de-interleave.But如何实现8-way de-interleave?
例如数据为:
uint8_t src[64] = {0,1,2,3,4,5,6,7,```63};
加载数据到neon register,8-way de-interleave后,希望src_reg1和src_reg2的值是这样的:
uint8x8x4_t src_reg1;
uint8x8x4_t src_reg2;
src_reg1.val[0] = {0,8, 16,24,32,40,48,56}
src_reg1.val[1] = {1,9, 17,25,```}
src_reg1.val[2] = {2,10,18,26,```}
src_reg1.val[3] = {3,11,19,27,```}
src_reg2.val[0] = {4,12,20,28,```}
src_reg2.val[1] = {5,13,21,29,```}
src_reg2.val[2] = {6,14,22,30,```}
src_reg2.val[3] = {7,15,23,31,39,47,55,63}
有人知道如何实现吗?非常感谢!
这就像执行两个 4 元素加载以获得两组 4 路去交错数据一样简单,然后通过寄存器交错操作之一进一步将这些数据集相互去交错,例如:
uint8x8x4_t src_reg1 = vld4_u8(src);
uint8x8x4_t src_reg2 = vld4_u8(src + 32);
for (int i = 0; i < 4; i++) {
// This a bit of a faff thanks to the intrinsic datatypes, but
// compiling at -O3 tidies it all up into sensible code
uint8x8x2_t tmp = vuzp_u8(src_reg1.val[i], src_reg2.val[i]);
src_reg1.val[i] = tmp.val[0];
src_reg2.val[i] = tmp.val[1];
}
在neon intrinsics中,有4个intrinsics(vld1 vld2 vld3 vld4)进行1-way to 4-way de-interleave.But如何实现8-way de-interleave?
例如数据为:
uint8_t src[64] = {0,1,2,3,4,5,6,7,```63};
加载数据到neon register,8-way de-interleave后,希望src_reg1和src_reg2的值是这样的:
uint8x8x4_t src_reg1;
uint8x8x4_t src_reg2;
src_reg1.val[0] = {0,8, 16,24,32,40,48,56}
src_reg1.val[1] = {1,9, 17,25,```}
src_reg1.val[2] = {2,10,18,26,```}
src_reg1.val[3] = {3,11,19,27,```}
src_reg2.val[0] = {4,12,20,28,```}
src_reg2.val[1] = {5,13,21,29,```}
src_reg2.val[2] = {6,14,22,30,```}
src_reg2.val[3] = {7,15,23,31,39,47,55,63}
有人知道如何实现吗?非常感谢!
这就像执行两个 4 元素加载以获得两组 4 路去交错数据一样简单,然后通过寄存器交错操作之一进一步将这些数据集相互去交错,例如:
uint8x8x4_t src_reg1 = vld4_u8(src);
uint8x8x4_t src_reg2 = vld4_u8(src + 32);
for (int i = 0; i < 4; i++) {
// This a bit of a faff thanks to the intrinsic datatypes, but
// compiling at -O3 tidies it all up into sensible code
uint8x8x2_t tmp = vuzp_u8(src_reg1.val[i], src_reg2.val[i]);
src_reg1.val[i] = tmp.val[0];
src_reg2.val[i] = tmp.val[1];
}