具有字节流输出（位进位）的 8b10b 编码器：更快的按位算法？

Question

我编写了一个 8b10b 编码器，它生成一个字节流，用于发送到串行发送器，串行发送器首先发送字节 as-is LSb。

我在这里所做的基本上是在 8 组上放置 10 位组（从输入字节流编码），因此不同数量的位从一个输出字节转移到下一个 -有点像 music/rhythm.

该程序已成功测试，但对于我的应用程序来说它太慢了大约 4-5 倍。我认为这是因为必须在数组中查找每一位。我的直觉告诉我，我们可以通过某种滚动掩码来加快速度，但我还不知道如何做到这一点，即使是将 3d 布尔数组换成 2D 整数数组。

有任何指示或其他想法吗？

这是代码。请忽略大部分宏和一些与决定写入哪个字节相关的代码，因为这是 application-specific.

Header:

#ifndef TX_BYTESTREAM_GEN_H_INCLUDED
#define TX_BYTESTREAM_GEN_H_INCLUDED

#include <stdint.h> //for standard portable types such as uint16_t

#define MAX_USB_TRANSFER_SIZE               1016 //Bytes, size of the max payload in a USB transaction. Determined using FT4222_GetMaxTRansferSize()
#define MAX_USB_PACKET_SIZE                 62 //Bytes, max size of the payload of a single USB packet
#define MANDATORY_TX_PACKET_BLOCK           5 //Bytes, constant - equal to the minimum number of bytes of TX packet necessary to exactly transfer blocks of 10 bits of encoded data (LCF of 8 and 10)
#define SYNC_CHARS_MAX_INTERVAL             172 //Target number of payload bytes between sync chars. Max is 188 before desynchronisation

#define ROUND_UP(N, S)                      ((((N) + (S) - 1) / (S)) * (S)) //Macro to round up the integer N to the largest multiple of the integer S
#define ROUND_DOWN(N,S)                     ((N / S) * S) //Same rounding down

#define N_SYNC_CHAR_PAIRS_IN_PCKT(pcktSz)   (ROUND_UP((pcktSz*1000/(SYNC_CHARS_MAX_INTERVAL+2)),1000)/1000) //Number of sync (K28.5) character/byte pairs in a given packet
#define TX_PAYLOAD_SIZE(pcktSz)             ((pcktSz*4/5)-2*N_SYNC_CHAR_PAIRS_IN_PCKT(pcktSz)) //Size in bytes of the payload data before encoding in a single TX packet

#define MAX_TX_PACKET_SIZE                  (ROUND_DOWN((MAX_USB_TRANSFER_SIZE-MAX_USB_PACKET_SIZE),(MAX_USB_PACKET_SIZE*MANDATORY_TX_PACKET_BLOCK))) //Maximum size in bytes of a TX packet
#define DEFAULT_TX_PACKET_SIZE              (MAX_TX_PACKET_SIZE-MAX_USB_PACKET_SIZE*MANDATORY_TX_PACKET_BLOCK) //Default size in bytes of a TX packet with some margin
#define MAX_TX_PAYLOAD_SIZE                 (TX_PAYLOAD_SIZE(MAX_TX_PACKET_SIZE)) //Maximum size in bytes of the payload in a TX packet
#define DEFAULT_TX_PAYLOAD_SIZE             (TX_PAYLOAD_SIZE(DEFAULT_TX_PACKET_SIZE))//Default size in bytes of the payload in a TX packet with some margin

//See string descriptors below for definitions. Error codes are individual bits so can be combined.
enum ErrCode
{
    NO_ERR = 0,
    INVALID_DIN_SIZE = 1,
    INVALID_DOUT_SIZE = 2,
    NULL_DIN_PTR = 4,
    NULL_DOUT_PTR = 8
};

char const * const ERR_CODE_DESC[] = {
    "No error",
    "Invalid size of input data",
    "Invalid size of output buffer",
    "Input data pointer is NULL",
    "Output buffer pointer is NULL"
};

/** @brief Generates the bytestream to the transmitter by encoding the incoming data using 8b10b encoding
    and inserting K28.5 synchronisation characters to maintain the synchronisation with the demodulator (LVDS passthrough mode)
    @arg din is a pointer to an allocated array of bytes which contains the data to encode
    @arg dinSize is the size of din in bytes. This size must be equal to TX_PAYLOAD_SIZE(doutSize)
    @arg dout is a pointer to an allocated array of bytes which is intended to contain the output bytestream to the transmitter
    @arg doutSize is the size of dout in bytes. This size must meet the conditions at the top of this function's implementation. Use DEFAULT_TX_PACKET_SIZE if in doubt.
    @return error code (c.f. ErrCode) **/
int TX_gen_bytestream(uint8_t *din, uint16_t dinSize, uint8_t *dout, uint16_t doutSize);


#endif // TX_BYTESTREAM_GEN_H_INCLUDED

源文件：

#include "TX_bytestream_gen.h"

#include <cstddef> //NULL

#define N_BYTE_VALUES (256+1) //256 possible data values + 1 special character (only accessible to this module)
#define N_ENCODED_BITS 10 //Number of bits corresponding to the 8b10b encoding of a byte

//Map the current running disparity, the desired value to encode to the array of encoded bits for 8b10b encoding.
//The Last value is the K28.5 sync character, only accessible to this module
//Notation = MSb to LSb
bool const encodedBits[2][N_BYTE_VALUES][N_ENCODED_BITS] =
{
    //Long table (see appendix)
};

//New value of the running disparity after encoding with the specified previous running disparity and requested byte value (c.f. above)
bool const encodingDisparity[2][N_BYTE_VALUES] =
{
    //Long table (see appendix)
};

int TX_gen_bytestream(uint8_t *din, uint16_t dinSize, uint8_t *dout, uint16_t doutSize)
{
    static bool RDp = false; //Running disparity is initially negative
    int ret = 0;

    //If the output buffer size is not a multiple of the mandatory payload block or of the USB packet size, or if it cannot be held in a single USB transaction
    //return an invalid output buffer size error
    if(doutSize == 0 || (doutSize % MANDATORY_TX_PACKET_BLOCK) || (doutSize % MAX_USB_PACKET_SIZE) || (doutSize > MAX_TX_PACKET_SIZE)) //Temp
        ret |= INVALID_DOUT_SIZE;
    //If the input data size is not consistent with the output buffer size, return the appropriate error code
    if(dinSize == 0 || dinSize != TX_PAYLOAD_SIZE(doutSize))
        ret |= INVALID_DIN_SIZE;
    if(din == NULL)
        ret |= NULL_DIN_PTR;
    if(dout == NULL)
        ret |= NULL_DOUT_PTR;

    //If everything checks out, carry on
    if(ret == NO_ERR)
    {
        uint16_t iByteIn = 0; //Index of the byte of input data currently being processed
        uint16_t iByteOut = 0; //Index of the output byte currently being written to
        uint8_t iBitOut = 0; //Starts with LSb
        int16_t nBytesUntilSync = 0; //Countdown of bytes until a sync marker needs to be sent. Cyclic.

        //For all output bytes to generate
        while(iByteOut < doutSize)
        {
            bool sync = false; //Initially this byte is not considered a sync byte (in which case the next byte of data will be processed)

            //If the maximum interval between sync characters has been reached, mark the two next bytes as sync bytes and reset the counter
            if(nBytesUntilSync <= 0)
            {
                sync = true;

                if(nBytesUntilSync == -1) //After the second SYNC is written, the counter is reset
                {
                    nBytesUntilSync = SYNC_CHARS_MAX_INTERVAL;
                }
            }

            //Append bit by bit the encoded data of the byte to write to the output bitstream (carried over from byte to byte) - LSb first
            //The byte to write is either the last byte of the encodedBits map (the sync character K28.5) if sync is set, or the next byte of
            //input data if it isn't
            uint16_t const byteToWrite = (sync?(N_BYTE_VALUES-1):din[iByteIn]);
            for(int8_t iEncodedBit = N_ENCODED_BITS-1 ; iEncodedBit >= 0 ; --iEncodedBit, iBitOut++)
            {
                //If the current output byte is complete, reset the bit index and select the next one
                if(iBitOut >= 8)
                {
                    iByteOut++;
                    iBitOut = 0;
                }

                //Effectively sets the iBitOut'th bit of the iByteOut'th byte out to the encoded value of the byte to write
                bool bitToWrite = encodedBits[RDp][byteToWrite][iEncodedBit]; //Temp
                dout[iByteOut] ^= (-bitToWrite ^ dout[iByteOut]) & (1 << iBitOut);
            }
            //The running disparity is also updated as per the standard (to achieve DC balance)
            RDp = encodingDisparity[RDp][byteToWrite]; //Update the running disparity

            //If sync was not set, this means a byte of the input data has been processed, in which case take the next one in
            //Also decrement the synchronisation counter
            if(!sync) {
                iByteIn++;
            }

            //In any case, decrease the synchronisation counter. Even sync characters decrease it (c.f. top of while loop)
            nBytesUntilSync--;
        }
    }

    return ret;
}

测试平台：

#include <iostream>
#include "TX_bytestream_gen.h"

#define PACKET_DURATION 0.000992 //In seconds, time of continuous data stream corresponding to one packet (5MHz output, default packet size)
#define TIME_TO_SIMULATE 10 //In seconds
#define PACKET_SIZE DEFAULT_TX_PACKET_SIZE
#define PAYLOAD_SIZE DEFAULT_TX_PAYLOAD_SIZE

#define N_ITERATIONS (TIME_TO_SIMULATE/PACKET_DURATION)

#include <chrono>

using namespace std;

//Testbench: measure the time taken to simulate TIME_TO_SIMULATE seconds of continuous encoding
int main()
{
    uint8_t toEncode[PAYLOAD_SIZE] = {100}; //Dummy data, doesn't matter
    uint8_t out[PACKET_SIZE] = {0};

    std::chrono::time_point<std::chrono::system_clock> start, end;

    start = std::chrono::system_clock::now();
    for(unsigned int i = 0 ; i < N_ITERATIONS ; i++)
    {
        TX_gen_bytestream(toEncode, PAYLOAD_SIZE, out, PACKET_SIZE);
    }
    end = std::chrono::system_clock::now();

    std::chrono::duration<double> elapsed_seconds = end - start;

    std::cout << "Task execution time: " << elapsed_seconds.count()/TIME_TO_SIMULATE*100 << "% (for " << TIME_TO_SIMULATE << "s simulated)\n";

    return 0;
}

附录：查找表。我没有足够的字符来粘贴它，但它看起来是这样的：

bool const encodedBits[2][N_BYTE_VALUES][N_ENCODED_BITS] =
{
    //Running disparity = RD-
    {
        {1,0,0,1,1,1,0,1,0,0},
        //...
    },
    //Running disparity = RD+
    {
        {0,1,1,0,0,0,1,0,1,1},
        //...
    }
};

bool const encodingDisparity[2][N_BYTE_VALUES] =
{
    //Previous running disparity was RD-
    {
        0,
        //...
    },
    //Previous running disparity was RD+
    {
        1,
        //...
    }
};

Answer 1

如果您一次一个字节而不是一次一个位地执行所有操作，这会快得多。

首先更改存储查找表的方式。你应该有这样的东西：

// conversion from (RD, byte) to (RD, 10-bit code)
// in each word, the lower 10 bits are the code,
// and bit 10 (the 11th bit) is the new RD
// The first 256 values are for RD -1, the next
// for RD 1
static const uint16_t BYTE_TO_CODE[512] = {
...
}

那么您需要将我们的编码循环更改为一次写入一个字节。您可以使用 uint16_t 来存储您输出的每个字节的剩余位。

类似这样的事情（我没有弄清楚你的同步字节逻辑，但大概你可以把它放在输入或输出字节循环中）：

// returns next isRD1
bool TX_gen_bytestream(uint8_t *dest, const uint8_t *src, size_t src_len, bool isRD1)
{
    // bits generated, but not yet written, LSB first
    uint16_t bits = 0;

    // number of bits in bits
    unsigned numbits = 0;

    //  current RD, either 0 or 256
    uint16_t rd = isRD1 ? 256 : 0;

    for (const uint8_t *end = src + src_len; src < end; ++src) {

        // lookup code and next rd
        uint16_t code = BYTE_TO_CODE[rd + *src];

        // new rd from code bit 10
        rd = (code>>2) & 256;

        // store bits
        bits |= (code & (uint16_t)0x03FF) << numbits;
        numbits+=10;

        // write out any complete bytes
        while(numbits >= 8) {
            *dest++ = (uint8_t)bits;
            bits >>=8;
            numbits-=8;
        }
    }

    // If src_len isn't divisible by 4, then we have some extra bits
    if (numbits) {
      *dest = (uint8_t)bits;
    }
    
    return !!rd;
}

具有字节流输出（位进位）的 8b10b 编码器：更快的按位算法？

8b10b encoder with byte stream output (bits carry): faster bitwise algorithm?

c++

memory

algorithm

bit-manipulation

bit