如何将二进制数据编码为任意文本表示？

Question

我需要一对函数来将二进制数据编码为任意文本表示，并将其解码回来

假设我们有一个任意大小的 ArrayBuffer：

const buffer = new ArrayBuffer(1000)

然后我们定义一个十六进制的“行话”，并用它来编码和解码十六进制字符串：

const lingo = "0123456789abcdef"

const text = encode(buffer, lingo)
const data = decode(text, lingo)

我的目标是定义我自己的 base48“行话”，省略元音以避免脏话：

const lingo = "256789bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ"

const text = encode(buffer, lingo)
const data = decode(text, lingo)

我们如何才能创建在任意表示之间有效转换数据的算法？尽管这让我觉得很基础，但我很难找到资源来帮助我完成这项任务

如果你能想到任何没有元音的似是而非的恶作剧词，加分，我什至把看起来像元音的数字都拿出来了！

我在 javascript 工作，但我也想了解一般原则。谢谢！

Answer 1

流式传输一系列 bytes/digits 并转换为另一个基础的挑战是找到源 bytes/digits 与目标 bytes/digits 的最有效比率。

为了确定最佳比率，下面的算法包含一个名为 mostEfficientChunk() 的函数，它将源编号基数、目标编号基数和最大源块大小作为参数。然后此函数遍历源块大小从 1 到最大块大小，并确定目标数基所需的最小 bytes/digits 数。例如，如果转换为基数 10，Unit8Array 的 1 个字节范围从 0 到 255 的源需要 3 个字节。在这个例子中，效率被测量为 1/3 或 33.33%。然后检查 2 个字节的源，其范围为 0 - 65535，需要以 10 为基数的 5 个字节，效率为 2/5 或 40%。因此，当从基数 256 转换为基数 10 时，2 字节的源块大小比 1 字节的块大小更有效。以此类推，直到找到小于或等于最大源块大小的最佳比率。

下面的代码转储了 mostEfficientChunk() 的评估，使最佳块大小的确定一目了然。

然后，一旦设置了块大小，源数据就被馈送到 'code()' 对源进行排队，然后如果存在足够的数据来形成块，则该函数将块转换为目标根据。请注意，如果源是流式传输，则可以连续调用 code()。流完成后，必须调用 flush() 附加代表 0 的数字，直到满足块大小，然后生成最终目标块。请注意，最后一个块已被填充，因此必须跟踪原始源的长度以 trim 适当解码。

代码中有一些注释和测试用例，有助于理解编码器 class 的运行方式。

class EncodeStream {

  constructor( fromBase, toBase, encode = 'encode', maxChunkSize = 32 ) {
    console.assert( typeof fromBase === 'string' || typeof fromBase === 'number' );
    console.assert( typeof toBase === 'string' || typeof toBase === 'number' );
    console.assert( encode === 'encode' || encode === 'decode' );
    
    this.encode = encode;
    
    if ( typeof fromBase === 'string' ) {
      this.fromBase = fromBase.length;
      this.fromBaseDigits = fromBase;
    } else {
      this.fromBase = fromBase |0;
      this.fromBaseDigits = null;
    }
    console.assert( 2 <= this.fromBase && this.fromBase <= 2**32 );
    
    if ( typeof toBase === 'string' ) {
      this.toBase = toBase.length;
      this.toBaseDigits = toBase;

    } else {
      this.toBase = toBase |0;
      this.toBaseDigits = null;
    }
    console.assert( 2 <= this.toBase && this.toBase <= 2**32 );
    
    if ( encode === 'encode' ) {
      this.chunking = this.mostEfficientChunk( this.fromBase, this.toBase, maxChunkSize );
    } else {
      let temp = this.mostEfficientChunk( this.toBase, this.fromBase, maxChunkSize );
      this.chunking = {
        bestSrcChunk: temp.bestTgtChunk,
        bestTgtChunk: temp.bestSrcChunk
      };
    }
    
    console.log( `Best Source Chunk Size:  ${this.chunking.bestSrcChunk}, Best Target Chunk Size:  ${this.chunking.bestTgtChunk}` );
    this.streamQueue = [];
  }
  
  code( stream ) {
    console.assert( typeof stream === 'string' || Array.isArray( stream ) );
    
    if ( this.fromBaseDigits ) {
      this.streamQueue.push( ...stream.split( '' ).map( digit => this.fromBaseDigits.indexOf( digit ) ) );
    } else {
      this.streamQueue.push( ...stream );
    }
    
    let result = [];
    while ( this.chunking.bestSrcChunk <= this.streamQueue.length ) {
      // Convert the source chunk to a BigInt value.
      let chunk = this.streamQueue.splice( 0, this.chunking.bestSrcChunk );
      let chunkValue = 0n;
      for ( let i = 0; i < chunk.length; i++ ) {
        chunkValue = chunkValue * BigInt( this.fromBase ) + BigInt( chunk[ i ] );
      }

      // And now convert the BigInt value to a target chunk.
      let temp = new Array( this.chunking.bestTgtChunk - 1 );
      for ( let i = 0; i < this.chunking.bestTgtChunk; i++ ) {
        temp[ this.chunking.bestTgtChunk - 1 - i ] = chunkValue % BigInt( this.toBase );
        chunkValue = chunkValue / BigInt( this.toBase );
      }
      
      result.push( ...temp );
    }
    
    // Finally, if the target base is represented by a string of digits, then map
    // the resulting array to the target digits.
    if ( this.toBaseDigits ) {
      result = result.map( digit => this.toBaseDigits[ digit ] ).join( '' );
    }
    return result;
  }
  
  flush() {
    // Simply add zero digits to the stream until we have a complete chunk.
    if ( 0 < this.streamQueue.length ) {
      while ( this.streamQueue.length < this.chunking.bestSrcChunk ) {
        if ( this.fromBaseDigits ) {
          this.streamQueue.push( this.fromBaseDigits[ 0 ] );
        } else {
          this.streamQueue.push( 0 );
        }
      }
    }
    return this.code( this.fromBaseDigits ? '' : [] );
  }
  
  
  mostEfficientChunk( sourceBase, targetBase, maxChunkSize ) {

    console.assert( 2 <= sourceBase && sourceBase <= 2 ** 32 );
    console.assert( 2 <= targetBase && targetBase <= 2 ** 32 );
    console.assert( 1 <= maxChunkSize && maxChunkSize <= 64 );
    
    // Since BigInt does not have a LOG function, let's just brute force
    // determine the maximum number of target digits per chunk size of
    // source digits...
    let sBase = BigInt( sourceBase );
    let tBase = BigInt( targetBase );
    let mSize = BigInt( maxChunkSize );
    let efficiency = 0;
    let result = { bestSrcChunk: 0, bestTgtChunk: 0 };
    
    for ( let chunkSize = 1n; chunkSize <= mSize; chunkSize++ ) {
      let maxSrcValue = sBase ** chunkSize - 1n;
      let maxSrcBits = maxSrcValue.toString( 2 ).length;
      
      let d = 0n;
      let msv = maxSrcValue;
      while ( 0n < msv ) {
        msv = msv / tBase;
        d++;
      }
      
      if ( this.encode === 'encode' ) {
        console.log( `Source Chunk Size: ${chunkSize}, Max Source Value: ${maxSrcValue}\nTarget Chunk Size: ${d}, Max Target Value: ${tBase**d-1n}, Efficiency: ${Number( chunkSize * 10000n / d ) / 100}%` );
      }
      
      if ( efficiency < Number( chunkSize ) / Number( d ) ) {
        efficiency = Number( chunkSize ) / Number( d );
        result.bestSrcChunk = Number( chunkSize );
        result.bestTgtChunk = Number( d );
      }
    }
   
    return result;
  }


}


let source, toBase, encoder, encoderResult, decoder, decoderResult;

source = [255,254,253,252,251];
toBase = '0123456789';

console.log( '\n\n' );
console.log( [ 'Encoding', source.join(','), `to base '${toBase}'` ] );
encoder = new EncodeStream( 256, toBase, 'encode', 2 );
encoderResult = '';
encoderResult += encoder.code( source );
encoderResult += encoder.flush();
console.log( `Encoded result: '${encoderResult}'` );

console.log( [ 'Decoding...' ] );
decoder = new EncodeStream( toBase, 256, 'decode', 2 );
decoderResult = '';
decoderResult += decoder.code( encoderResult );
decoderResult += decoder.flush();
console.log( `Decoded result: '${decoderResult}'` );

console.log( '\n\n' );
console.log( [ 'Encoding', source.join(','), `to base '${toBase}'` ] );
encoder = new EncodeStream( 256, toBase, 'encode', 16 );
encoderResult = '';
encoderResult += encoder.code( source );
encoderResult += encoder.flush();
console.log( `Encoded result: '${encoderResult}'` );

console.log( [ 'Decoding...' ] );
decoder = new EncodeStream( toBase, 256, 'decode', 16 );
decoderResult = '';
decoderResult += decoder.code( encoderResult );
decoderResult += decoder.flush();
console.log( `Decoded result: '${decoderResult}'` );

source = [255,254,253,252,251,250,249,248,247];
toBase = '256789bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ';

console.log( '\n\n' );
console.log( [ 'Encoding', source.join(','), `to base '${toBase}'` ] );
encoder = new EncodeStream( 256, toBase, 'encode', 16 );
encoderResult = '';
encoderResult += encoder.code( source );
encoderResult += encoder.flush();
console.log( `Encoded result: '${encoderResult}'` );

console.log( [ 'Decoding...' ] );
decoder = new EncodeStream( toBase, 256, 'decode', 16 );
decoderResult = '';
decoderResult += decoder.code( encoderResult );
decoderResult += decoder.flush();
console.log( `Decoded result: '${decoderResult}'` );

请注意，您似乎需要打开浏览器调试器才能查看完整的控制台日志结果。

如何将二进制数据编码为任意文本表示？

How to encode binary data as any arbitrary text representation?

javascript

binary

encoding