如何将二进制数据编码为任意文本表示?
How to encode binary data as any arbitrary text representation?
我需要一对函数来将二进制数据编码为任意文本表示,并将其解码回来
假设我们有一个任意大小的 ArrayBuffer:
const buffer = new ArrayBuffer(1000)
然后我们定义一个十六进制的“行话”,并用它来编码和解码十六进制字符串:
const lingo = "0123456789abcdef"
const text = encode(buffer, lingo)
const data = decode(text, lingo)
我的目标是定义我自己的 base48“行话”,省略元音以避免脏话:
const lingo = "256789bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ"
const text = encode(buffer, lingo)
const data = decode(text, lingo)
我们如何才能创建在任意表示之间有效转换数据的算法?尽管这让我觉得很基础,但我很难找到资源来帮助我完成这项任务
如果你能想到任何没有元音的似是而非的恶作剧词,加分,我什至把看起来像元音的数字都拿出来了!
我在 javascript 工作,但我也想了解一般原则。谢谢!
流式传输一系列 bytes/digits 并转换为另一个基础的挑战是找到源 bytes/digits 与目标 bytes/digits 的最有效比率。
为了确定最佳比率,下面的算法包含一个名为 mostEfficientChunk()
的函数,它将源编号基数、目标编号基数和最大源块大小作为参数。然后此函数遍历源块大小从 1 到最大块大小,并确定目标数基所需的最小 bytes/digits 数。例如,如果转换为基数 10,Unit8Array 的 1 个字节范围从 0 到 255 的源需要 3 个字节。在这个例子中,效率被测量为 1/3 或 33.33%。然后检查 2 个字节的源,其范围为 0 - 65535,需要以 10 为基数的 5 个字节,效率为 2/5 或 40%。因此,当从基数 256 转换为基数 10 时,2 字节的源块大小比 1 字节的块大小更有效。以此类推,直到找到小于或等于最大源块大小的最佳比率。
下面的代码转储了 mostEfficientChunk()
的评估,使最佳块大小的确定一目了然。
然后,一旦设置了块大小,源数据就被馈送到 'code()' 对源进行排队,然后如果存在足够的数据来形成块,则该函数将块转换为目标根据。请注意,如果源是流式传输,则可以连续调用 code()
。流完成后,必须调用 flush()
附加代表 0
的数字,直到满足块大小,然后生成最终目标块。请注意,最后一个块已被填充,因此必须跟踪原始源的长度以 trim 适当解码。
代码中有一些注释和测试用例,有助于理解编码器 class 的运行方式。
class EncodeStream {
constructor( fromBase, toBase, encode = 'encode', maxChunkSize = 32 ) {
console.assert( typeof fromBase === 'string' || typeof fromBase === 'number' );
console.assert( typeof toBase === 'string' || typeof toBase === 'number' );
console.assert( encode === 'encode' || encode === 'decode' );
this.encode = encode;
if ( typeof fromBase === 'string' ) {
this.fromBase = fromBase.length;
this.fromBaseDigits = fromBase;
} else {
this.fromBase = fromBase |0;
this.fromBaseDigits = null;
}
console.assert( 2 <= this.fromBase && this.fromBase <= 2**32 );
if ( typeof toBase === 'string' ) {
this.toBase = toBase.length;
this.toBaseDigits = toBase;
} else {
this.toBase = toBase |0;
this.toBaseDigits = null;
}
console.assert( 2 <= this.toBase && this.toBase <= 2**32 );
if ( encode === 'encode' ) {
this.chunking = this.mostEfficientChunk( this.fromBase, this.toBase, maxChunkSize );
} else {
let temp = this.mostEfficientChunk( this.toBase, this.fromBase, maxChunkSize );
this.chunking = {
bestSrcChunk: temp.bestTgtChunk,
bestTgtChunk: temp.bestSrcChunk
};
}
console.log( `Best Source Chunk Size: ${this.chunking.bestSrcChunk}, Best Target Chunk Size: ${this.chunking.bestTgtChunk}` );
this.streamQueue = [];
}
code( stream ) {
console.assert( typeof stream === 'string' || Array.isArray( stream ) );
if ( this.fromBaseDigits ) {
this.streamQueue.push( ...stream.split( '' ).map( digit => this.fromBaseDigits.indexOf( digit ) ) );
} else {
this.streamQueue.push( ...stream );
}
let result = [];
while ( this.chunking.bestSrcChunk <= this.streamQueue.length ) {
// Convert the source chunk to a BigInt value.
let chunk = this.streamQueue.splice( 0, this.chunking.bestSrcChunk );
let chunkValue = 0n;
for ( let i = 0; i < chunk.length; i++ ) {
chunkValue = chunkValue * BigInt( this.fromBase ) + BigInt( chunk[ i ] );
}
// And now convert the BigInt value to a target chunk.
let temp = new Array( this.chunking.bestTgtChunk - 1 );
for ( let i = 0; i < this.chunking.bestTgtChunk; i++ ) {
temp[ this.chunking.bestTgtChunk - 1 - i ] = chunkValue % BigInt( this.toBase );
chunkValue = chunkValue / BigInt( this.toBase );
}
result.push( ...temp );
}
// Finally, if the target base is represented by a string of digits, then map
// the resulting array to the target digits.
if ( this.toBaseDigits ) {
result = result.map( digit => this.toBaseDigits[ digit ] ).join( '' );
}
return result;
}
flush() {
// Simply add zero digits to the stream until we have a complete chunk.
if ( 0 < this.streamQueue.length ) {
while ( this.streamQueue.length < this.chunking.bestSrcChunk ) {
if ( this.fromBaseDigits ) {
this.streamQueue.push( this.fromBaseDigits[ 0 ] );
} else {
this.streamQueue.push( 0 );
}
}
}
return this.code( this.fromBaseDigits ? '' : [] );
}
mostEfficientChunk( sourceBase, targetBase, maxChunkSize ) {
console.assert( 2 <= sourceBase && sourceBase <= 2 ** 32 );
console.assert( 2 <= targetBase && targetBase <= 2 ** 32 );
console.assert( 1 <= maxChunkSize && maxChunkSize <= 64 );
// Since BigInt does not have a LOG function, let's just brute force
// determine the maximum number of target digits per chunk size of
// source digits...
let sBase = BigInt( sourceBase );
let tBase = BigInt( targetBase );
let mSize = BigInt( maxChunkSize );
let efficiency = 0;
let result = { bestSrcChunk: 0, bestTgtChunk: 0 };
for ( let chunkSize = 1n; chunkSize <= mSize; chunkSize++ ) {
let maxSrcValue = sBase ** chunkSize - 1n;
let maxSrcBits = maxSrcValue.toString( 2 ).length;
let d = 0n;
let msv = maxSrcValue;
while ( 0n < msv ) {
msv = msv / tBase;
d++;
}
if ( this.encode === 'encode' ) {
console.log( `Source Chunk Size: ${chunkSize}, Max Source Value: ${maxSrcValue}\nTarget Chunk Size: ${d}, Max Target Value: ${tBase**d-1n}, Efficiency: ${Number( chunkSize * 10000n / d ) / 100}%` );
}
if ( efficiency < Number( chunkSize ) / Number( d ) ) {
efficiency = Number( chunkSize ) / Number( d );
result.bestSrcChunk = Number( chunkSize );
result.bestTgtChunk = Number( d );
}
}
return result;
}
}
let source, toBase, encoder, encoderResult, decoder, decoderResult;
source = [255,254,253,252,251];
toBase = '0123456789';
console.log( '\n\n' );
console.log( [ 'Encoding', source.join(','), `to base '${toBase}'` ] );
encoder = new EncodeStream( 256, toBase, 'encode', 2 );
encoderResult = '';
encoderResult += encoder.code( source );
encoderResult += encoder.flush();
console.log( `Encoded result: '${encoderResult}'` );
console.log( [ 'Decoding...' ] );
decoder = new EncodeStream( toBase, 256, 'decode', 2 );
decoderResult = '';
decoderResult += decoder.code( encoderResult );
decoderResult += decoder.flush();
console.log( `Decoded result: '${decoderResult}'` );
console.log( '\n\n' );
console.log( [ 'Encoding', source.join(','), `to base '${toBase}'` ] );
encoder = new EncodeStream( 256, toBase, 'encode', 16 );
encoderResult = '';
encoderResult += encoder.code( source );
encoderResult += encoder.flush();
console.log( `Encoded result: '${encoderResult}'` );
console.log( [ 'Decoding...' ] );
decoder = new EncodeStream( toBase, 256, 'decode', 16 );
decoderResult = '';
decoderResult += decoder.code( encoderResult );
decoderResult += decoder.flush();
console.log( `Decoded result: '${decoderResult}'` );
source = [255,254,253,252,251,250,249,248,247];
toBase = '256789bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ';
console.log( '\n\n' );
console.log( [ 'Encoding', source.join(','), `to base '${toBase}'` ] );
encoder = new EncodeStream( 256, toBase, 'encode', 16 );
encoderResult = '';
encoderResult += encoder.code( source );
encoderResult += encoder.flush();
console.log( `Encoded result: '${encoderResult}'` );
console.log( [ 'Decoding...' ] );
decoder = new EncodeStream( toBase, 256, 'decode', 16 );
decoderResult = '';
decoderResult += decoder.code( encoderResult );
decoderResult += decoder.flush();
console.log( `Decoded result: '${decoderResult}'` );
请注意,您似乎需要打开浏览器调试器才能查看完整的控制台日志结果。
我需要一对函数来将二进制数据编码为任意文本表示,并将其解码回来
假设我们有一个任意大小的 ArrayBuffer:
const buffer = new ArrayBuffer(1000)
然后我们定义一个十六进制的“行话”,并用它来编码和解码十六进制字符串:
const lingo = "0123456789abcdef"
const text = encode(buffer, lingo)
const data = decode(text, lingo)
我的目标是定义我自己的 base48“行话”,省略元音以避免脏话:
const lingo = "256789bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ"
const text = encode(buffer, lingo)
const data = decode(text, lingo)
我们如何才能创建在任意表示之间有效转换数据的算法?尽管这让我觉得很基础,但我很难找到资源来帮助我完成这项任务
如果你能想到任何没有元音的似是而非的恶作剧词,加分,我什至把看起来像元音的数字都拿出来了!
我在 javascript 工作,但我也想了解一般原则。谢谢!
流式传输一系列 bytes/digits 并转换为另一个基础的挑战是找到源 bytes/digits 与目标 bytes/digits 的最有效比率。
为了确定最佳比率,下面的算法包含一个名为 mostEfficientChunk()
的函数,它将源编号基数、目标编号基数和最大源块大小作为参数。然后此函数遍历源块大小从 1 到最大块大小,并确定目标数基所需的最小 bytes/digits 数。例如,如果转换为基数 10,Unit8Array 的 1 个字节范围从 0 到 255 的源需要 3 个字节。在这个例子中,效率被测量为 1/3 或 33.33%。然后检查 2 个字节的源,其范围为 0 - 65535,需要以 10 为基数的 5 个字节,效率为 2/5 或 40%。因此,当从基数 256 转换为基数 10 时,2 字节的源块大小比 1 字节的块大小更有效。以此类推,直到找到小于或等于最大源块大小的最佳比率。
下面的代码转储了 mostEfficientChunk()
的评估,使最佳块大小的确定一目了然。
然后,一旦设置了块大小,源数据就被馈送到 'code()' 对源进行排队,然后如果存在足够的数据来形成块,则该函数将块转换为目标根据。请注意,如果源是流式传输,则可以连续调用 code()
。流完成后,必须调用 flush()
附加代表 0
的数字,直到满足块大小,然后生成最终目标块。请注意,最后一个块已被填充,因此必须跟踪原始源的长度以 trim 适当解码。
代码中有一些注释和测试用例,有助于理解编码器 class 的运行方式。
class EncodeStream {
constructor( fromBase, toBase, encode = 'encode', maxChunkSize = 32 ) {
console.assert( typeof fromBase === 'string' || typeof fromBase === 'number' );
console.assert( typeof toBase === 'string' || typeof toBase === 'number' );
console.assert( encode === 'encode' || encode === 'decode' );
this.encode = encode;
if ( typeof fromBase === 'string' ) {
this.fromBase = fromBase.length;
this.fromBaseDigits = fromBase;
} else {
this.fromBase = fromBase |0;
this.fromBaseDigits = null;
}
console.assert( 2 <= this.fromBase && this.fromBase <= 2**32 );
if ( typeof toBase === 'string' ) {
this.toBase = toBase.length;
this.toBaseDigits = toBase;
} else {
this.toBase = toBase |0;
this.toBaseDigits = null;
}
console.assert( 2 <= this.toBase && this.toBase <= 2**32 );
if ( encode === 'encode' ) {
this.chunking = this.mostEfficientChunk( this.fromBase, this.toBase, maxChunkSize );
} else {
let temp = this.mostEfficientChunk( this.toBase, this.fromBase, maxChunkSize );
this.chunking = {
bestSrcChunk: temp.bestTgtChunk,
bestTgtChunk: temp.bestSrcChunk
};
}
console.log( `Best Source Chunk Size: ${this.chunking.bestSrcChunk}, Best Target Chunk Size: ${this.chunking.bestTgtChunk}` );
this.streamQueue = [];
}
code( stream ) {
console.assert( typeof stream === 'string' || Array.isArray( stream ) );
if ( this.fromBaseDigits ) {
this.streamQueue.push( ...stream.split( '' ).map( digit => this.fromBaseDigits.indexOf( digit ) ) );
} else {
this.streamQueue.push( ...stream );
}
let result = [];
while ( this.chunking.bestSrcChunk <= this.streamQueue.length ) {
// Convert the source chunk to a BigInt value.
let chunk = this.streamQueue.splice( 0, this.chunking.bestSrcChunk );
let chunkValue = 0n;
for ( let i = 0; i < chunk.length; i++ ) {
chunkValue = chunkValue * BigInt( this.fromBase ) + BigInt( chunk[ i ] );
}
// And now convert the BigInt value to a target chunk.
let temp = new Array( this.chunking.bestTgtChunk - 1 );
for ( let i = 0; i < this.chunking.bestTgtChunk; i++ ) {
temp[ this.chunking.bestTgtChunk - 1 - i ] = chunkValue % BigInt( this.toBase );
chunkValue = chunkValue / BigInt( this.toBase );
}
result.push( ...temp );
}
// Finally, if the target base is represented by a string of digits, then map
// the resulting array to the target digits.
if ( this.toBaseDigits ) {
result = result.map( digit => this.toBaseDigits[ digit ] ).join( '' );
}
return result;
}
flush() {
// Simply add zero digits to the stream until we have a complete chunk.
if ( 0 < this.streamQueue.length ) {
while ( this.streamQueue.length < this.chunking.bestSrcChunk ) {
if ( this.fromBaseDigits ) {
this.streamQueue.push( this.fromBaseDigits[ 0 ] );
} else {
this.streamQueue.push( 0 );
}
}
}
return this.code( this.fromBaseDigits ? '' : [] );
}
mostEfficientChunk( sourceBase, targetBase, maxChunkSize ) {
console.assert( 2 <= sourceBase && sourceBase <= 2 ** 32 );
console.assert( 2 <= targetBase && targetBase <= 2 ** 32 );
console.assert( 1 <= maxChunkSize && maxChunkSize <= 64 );
// Since BigInt does not have a LOG function, let's just brute force
// determine the maximum number of target digits per chunk size of
// source digits...
let sBase = BigInt( sourceBase );
let tBase = BigInt( targetBase );
let mSize = BigInt( maxChunkSize );
let efficiency = 0;
let result = { bestSrcChunk: 0, bestTgtChunk: 0 };
for ( let chunkSize = 1n; chunkSize <= mSize; chunkSize++ ) {
let maxSrcValue = sBase ** chunkSize - 1n;
let maxSrcBits = maxSrcValue.toString( 2 ).length;
let d = 0n;
let msv = maxSrcValue;
while ( 0n < msv ) {
msv = msv / tBase;
d++;
}
if ( this.encode === 'encode' ) {
console.log( `Source Chunk Size: ${chunkSize}, Max Source Value: ${maxSrcValue}\nTarget Chunk Size: ${d}, Max Target Value: ${tBase**d-1n}, Efficiency: ${Number( chunkSize * 10000n / d ) / 100}%` );
}
if ( efficiency < Number( chunkSize ) / Number( d ) ) {
efficiency = Number( chunkSize ) / Number( d );
result.bestSrcChunk = Number( chunkSize );
result.bestTgtChunk = Number( d );
}
}
return result;
}
}
let source, toBase, encoder, encoderResult, decoder, decoderResult;
source = [255,254,253,252,251];
toBase = '0123456789';
console.log( '\n\n' );
console.log( [ 'Encoding', source.join(','), `to base '${toBase}'` ] );
encoder = new EncodeStream( 256, toBase, 'encode', 2 );
encoderResult = '';
encoderResult += encoder.code( source );
encoderResult += encoder.flush();
console.log( `Encoded result: '${encoderResult}'` );
console.log( [ 'Decoding...' ] );
decoder = new EncodeStream( toBase, 256, 'decode', 2 );
decoderResult = '';
decoderResult += decoder.code( encoderResult );
decoderResult += decoder.flush();
console.log( `Decoded result: '${decoderResult}'` );
console.log( '\n\n' );
console.log( [ 'Encoding', source.join(','), `to base '${toBase}'` ] );
encoder = new EncodeStream( 256, toBase, 'encode', 16 );
encoderResult = '';
encoderResult += encoder.code( source );
encoderResult += encoder.flush();
console.log( `Encoded result: '${encoderResult}'` );
console.log( [ 'Decoding...' ] );
decoder = new EncodeStream( toBase, 256, 'decode', 16 );
decoderResult = '';
decoderResult += decoder.code( encoderResult );
decoderResult += decoder.flush();
console.log( `Decoded result: '${decoderResult}'` );
source = [255,254,253,252,251,250,249,248,247];
toBase = '256789bcdfghjklmnpqrstvwxyzBCDFGHJKLMNPQRSTVWXYZ';
console.log( '\n\n' );
console.log( [ 'Encoding', source.join(','), `to base '${toBase}'` ] );
encoder = new EncodeStream( 256, toBase, 'encode', 16 );
encoderResult = '';
encoderResult += encoder.code( source );
encoderResult += encoder.flush();
console.log( `Encoded result: '${encoderResult}'` );
console.log( [ 'Decoding...' ] );
decoder = new EncodeStream( toBase, 256, 'decode', 16 );
decoderResult = '';
decoderResult += decoder.code( encoderResult );
decoderResult += decoder.flush();
console.log( `Decoded result: '${decoderResult}'` );
请注意,您似乎需要打开浏览器调试器才能查看完整的控制台日志结果。