如何使伪随机 BigInt 生成器转换为特定长度字符的字符串?

How to make pseudo-random BigInt generator convert to string of particular length of characters?

沿着 的路线,我走到这一步(parts 是一个包含 32 个 2 字符“符号”的数组):

const parts = `mi
ma
mo
ne
nu
di
da
do
be
bu
ti
te
ta
to
tu
ki
ke
ka
ko
ku
si
sa
so
ze
zu
fi
fa
fo
ve
vu
xe
xu`.trim().split(/\n+/)

const fetch = (x, o) => {
  if (x >= o) {
    return x
  } else {
    const v = (x * x) % o
    return (x <= (o / 2n)) ? v : o - v
  }
}

const fetchLarge = (x) => fetch(x, 41223334444555556666667777777888888889999999997n)

// the last number can be anything.
const buildLarge = (x, o) => fetchLarge((fetchLarge(x) + o) % BigInt(Math.pow(32, 31)) ^ 2030507011013017019023n)

const createArray = (n, mod = 32n) => {
  if (!n) return [0];
  let arr = [];
  while (n) {
    arr.push(Number(n % mod));
    n /= mod;
  }
  return arr;
}

const write = (i) => {
  const x = buildLarge(i++, 272261127249452727280272961627319532734291n)
  return createArray(x).map(x => parts[x]).join('')
}

let i = 1n
while (i < 10000) {
  console.log(write(i))
}

我生成的结果如下:

kitekutefaxunetotuzezumatotamabukidimasoxumoxudofasositinu,6038940986212279582529645303138677298679151
sokiketufikefotekakidotetotesamizununetefokixefitetisovene,5431347628569519336817719657935192515363318
xudamituzesimixuxemixudakedatetutununekobuzexesozuxedinenu,5713969289948157459645315228321450728816863
dazenenemovudadikukatatakibekaxexemovubedivusidatafisasine,5082175912370834928186684152014555456835302
xufotidosokabunudomimibefisimakusimokedamomazexekofomokane,4925740069222414438181195472381794794580863
sodozekadakuzemaxetexukuzumisikitazufitizexekatetotuxusone,5182433137814021540565892366585827483507958
kikokasatudatidatufikizesadimatakakatudisibumofotuzutaze,1019165422643074024784461594259815846823503
dakikinetofonexesimavufafisaxefosafisikofotasanekovetevu,1279315636939618596561544978621478602915302
kinunebebuzukokemidatekobusofokikozukobedodakesisikunuki,659622269329577207976266617866288582888591
sozesifamoxebusitotesisasizekudasomitatavudidizukadimate,480714979099063166920265752208932468511478
xumakikofakumixefotisikunumovudafasofikimozenudafosidaka,749508057657951412178315361964670398839871
dazedokutituzufakebutifokekusobuzutemanesadafadatetitamo,103886097260879003150138325027254855900902
xukemizukozefaxetudizukedimotevubesitekitavukakevutisibe,376136321524704717800574424940622855799327
dozexedivenudifabuvedavebukeketozukumasimakuvetuketomafaxe,42948292938975784099596927092482269526555367
mimasatukidisodifikekutovumazefikefonemofimotesonusazexuxe,43196343143305047528500657761292227037320224
zedafimasobukudizedozefoketuzekisadotufikudadokisakedofoxe,43000150124549846140482724444846720574088407
kisafimosotuvuvuzuzukodibevutemidazusisamokososikomofavuma,2692423943832809210699522830552769656612527
soxutokonebusidaketesomoxemibesonubudibekunumatifokokanemo,2942202721014541374299446744441542204274678
xusikematetemititafafakuxusinekefoketonebetokudonesomosama,2312137916289687577537008913213461971911327

我怎样才能使所有字符串的长度都是 31 个“符号”(因为在这个例子中符号是 2 个字符,所以总共有 62 个字符),像这样:

xusikematetemititafafakuxusinekexusikematetemititafafakuxusine

即:算法中上面的3个bigint大数应该是什么?另外,它们应该是什么才能使分布看起来是随机的?我注意到,与较小的数字相比,在边界附近使用较大的数字会产生更好的明显随机化结果。此外,您不能只在 bigint x 前加上 0,这会导致 mamamamamama...。最后,一个序列中最多可以有 2 对相同的字母,我认为你只能通过跳过不符合该约束的结果来真正解决这个问题(除非有一些数学魔法可以以某种方式判断更多这 32 个“符号”中有两个并排出现)。

关于最后一部分,这些是有效结果:

mamavumamavumama...
nanavumamavuvuma...

这些无效:

mamamavumamavuma...
mavuvuvumamavuma...

因为一排有3对是一样的

总结一下:

  1. 如何使所有字符串的长度都是 62 个字符,而不用零填充?这意味着它必须适合我不太确定的某个 BigInts 范围。
  2. 因此分布看起来非常随机(也就是说,我们不仅看到序列的尾端缓慢变化,而且整个数字似乎完全改变,如示例所示)。
  3. 所以在一个序列中不超过两对是相似的?这部分可以通过跳过我们在伪随机序列中找到的结果来解决,除非有一些我不可能理解的魔法来完成它 :) 例如,也许有一些魔法可以处理类似 5 的倍数-位块或其他东西,我不知道。但是不需要花哨,跳过与正则表达式匹配的结果也可以。

这里我们使用 Base 32(硬编码,但可以是 parts.length)作为 2 (maxRepeat) 个最低有效数字和 Base 31(硬编​​码,但可以是 parts.length-1) 用于剩余的数字。这给出了长度值的最大范围。

0ngetMax()(含)的所有值都可以编码为 31 (minLength) 个符号。

防止重复长度超过 maxRepeat 的魔法是检查第 i 位与 i - maxRepeat 位,如果出现以下情况,则对第 i 位进行调整>=。虽然这会产生有效的编码(遵循规则的编码),但并非所有任意符号序列都是有效的,即使它们遵循规则也是如此。例如,序列 mimami 永远不会生成,也无法解码。

const split = new RegExp(`.{2}`, 'g');
const parts = 'mimamonenudidadobebutitetatotukikekakokusisasozezufifafovevuxexu'.match(split);
const partsMap = Object.fromEntries(parts.map((v,i) => ([v,BigInt(i)])));

const encode = (value, maxRepeat = 2, minLength = 31) => {
  value = BigInt(value);
  const digits = [];
  // convert the value to digits
  // the first least significant `maxRepeat` digits use base 32, the rest use base 31
  while(value > 0) {
    const radix = digits.length < maxRepeat ? 32n : 31n;
    digits.push(value % radix);
    value /= radix;
  }
  // add 0 padding
  while(digits.length < minLength) {
    digits.push(0n);
  }
  // adjust digits to prevent sequences longer than `maxRepeat`
  const symbols = []
  digits.forEach((v,i) => {
    symbols.push((i < maxRepeat || v < symbols[i-maxRepeat]) ? v : v+1n);
  });
  // map to symbols and return string
  const str = symbols.map(v => parts[v]).join('');
  return str;
};

const decode = (str, maxRepeat = 2) => {
  // split string into array of symbols
  const symbols = str.match(split);
  // convert symbols to digits
  const digits = symbols.map(v => partsMap[v]).map((v,i,a) => {
    if(i < maxRepeat || v < a[i-maxRepeat]) return v;
    return v-1n;
  });
  // compute the threshold where we transition from base 31 to base 32
  const threshold = digits.length - maxRepeat;
  // convert digits to BigInt
  const results = digits.reverse().reduce(
    (s,v,i) => (s * (i >= threshold ? 32n : 31n) + v)
  , 0n);
  return results;
};

// compute the maximum value that can be encoded using `minLength` number of symbols
const getMax = (maxRepeat = 2, minLength = 31) => 32n ** BigInt(maxRepeat) * 31n ** BigInt(minLength - maxRepeat) - 1n;

// Consoles will print BigInt but Whosebug's interpreter 
// doesn't understand them yet so we use `.toString()`.
console.log('limit:', getMax().toString());
console.log(encode(getMax()));

const n1 = 6038940986212279582529645303138677298679151n;
console.log(encode(n1)); // 'kitefitifomazekosaxubezutatudofotudimidanemadanumasisivebumimi'
const n2 = 0n;
console.log(encode(n2)); // 'mimimamamimimamamimimamamimimamamimimamamimimamamimimamamimima'

const s1 = 'kitefitifomazekosaxubezutatudofotudimidanemadanumasisivebumimi';
console.log(decode(s1).toString()); // 6038940986212279582529645303138677298679151
const s2 = 'mimimamamimimamamimimamamimimamamimimamamimimamamimimamamimima';
console.log(decode(s2).toString()); // 0

console.log(decode(encode(0n)) == 0n);
console.log(decode(encode(6038940986212279582529645303138677298679151n)) == 6038940986212279582529645303138677298679151n);
.as-console-wrapper { max-height: 100% !important; top: 0; }