如何使伪随机 BigInt 生成器转换为特定长度字符的字符串?
How to make pseudo-random BigInt generator convert to string of particular length of characters?
沿着 的路线,我走到这一步(parts
是一个包含 32 个 2 字符“符号”的数组):
const parts = `mi
ma
mo
ne
nu
di
da
do
be
bu
ti
te
ta
to
tu
ki
ke
ka
ko
ku
si
sa
so
ze
zu
fi
fa
fo
ve
vu
xe
xu`.trim().split(/\n+/)
const fetch = (x, o) => {
if (x >= o) {
return x
} else {
const v = (x * x) % o
return (x <= (o / 2n)) ? v : o - v
}
}
const fetchLarge = (x) => fetch(x, 41223334444555556666667777777888888889999999997n)
// the last number can be anything.
const buildLarge = (x, o) => fetchLarge((fetchLarge(x) + o) % BigInt(Math.pow(32, 31)) ^ 2030507011013017019023n)
const createArray = (n, mod = 32n) => {
if (!n) return [0];
let arr = [];
while (n) {
arr.push(Number(n % mod));
n /= mod;
}
return arr;
}
const write = (i) => {
const x = buildLarge(i++, 272261127249452727280272961627319532734291n)
return createArray(x).map(x => parts[x]).join('')
}
let i = 1n
while (i < 10000) {
console.log(write(i))
}
我生成的结果如下:
kitekutefaxunetotuzezumatotamabukidimasoxumoxudofasositinu,6038940986212279582529645303138677298679151
sokiketufikefotekakidotetotesamizununetefokixefitetisovene,5431347628569519336817719657935192515363318
xudamituzesimixuxemixudakedatetutununekobuzexesozuxedinenu,5713969289948157459645315228321450728816863
dazenenemovudadikukatatakibekaxexemovubedivusidatafisasine,5082175912370834928186684152014555456835302
xufotidosokabunudomimibefisimakusimokedamomazexekofomokane,4925740069222414438181195472381794794580863
sodozekadakuzemaxetexukuzumisikitazufitizexekatetotuxusone,5182433137814021540565892366585827483507958
kikokasatudatidatufikizesadimatakakatudisibumofotuzutaze,1019165422643074024784461594259815846823503
dakikinetofonexesimavufafisaxefosafisikofotasanekovetevu,1279315636939618596561544978621478602915302
kinunebebuzukokemidatekobusofokikozukobedodakesisikunuki,659622269329577207976266617866288582888591
sozesifamoxebusitotesisasizekudasomitatavudidizukadimate,480714979099063166920265752208932468511478
xumakikofakumixefotisikunumovudafasofikimozenudafosidaka,749508057657951412178315361964670398839871
dazedokutituzufakebutifokekusobuzutemanesadafadatetitamo,103886097260879003150138325027254855900902
xukemizukozefaxetudizukedimotevubesitekitavukakevutisibe,376136321524704717800574424940622855799327
dozexedivenudifabuvedavebukeketozukumasimakuvetuketomafaxe,42948292938975784099596927092482269526555367
mimasatukidisodifikekutovumazefikefonemofimotesonusazexuxe,43196343143305047528500657761292227037320224
zedafimasobukudizedozefoketuzekisadotufikudadokisakedofoxe,43000150124549846140482724444846720574088407
kisafimosotuvuvuzuzukodibevutemidazusisamokososikomofavuma,2692423943832809210699522830552769656612527
soxutokonebusidaketesomoxemibesonubudibekunumatifokokanemo,2942202721014541374299446744441542204274678
xusikematetemititafafakuxusinekefoketonebetokudonesomosama,2312137916289687577537008913213461971911327
我怎样才能使所有字符串的长度都是 31 个“符号”(因为在这个例子中符号是 2 个字符,所以总共有 62 个字符),像这样:
xusikematetemititafafakuxusinekexusikematetemititafafakuxusine
即:算法中上面的3个bigint大数应该是什么?另外,它们应该是什么才能使分布看起来是随机的?我注意到,与较小的数字相比,在边界附近使用较大的数字会产生更好的明显随机化结果。此外,您不能只在 bigint x
前加上 0,这会导致 mamamamamama...
。最后,一个序列中最多可以有 2 对相同的字母,我认为你只能通过跳过不符合该约束的结果来真正解决这个问题(除非有一些数学魔法可以以某种方式判断更多这 32 个“符号”中有两个并排出现)。
关于最后一部分,这些是有效结果:
mamavumamavumama...
nanavumamavuvuma...
这些无效:
mamamavumamavuma...
mavuvuvumamavuma...
因为一排有3对是一样的
总结一下:
- 如何使所有字符串的长度都是 62 个字符,而不用零填充?这意味着它必须适合我不太确定的某个 BigInts 范围。
- 因此分布看起来非常随机(也就是说,我们不仅看到序列的尾端缓慢变化,而且整个数字似乎完全改变,如示例所示)。
- 所以在一个序列中不超过两对是相似的?这部分可以通过跳过我们在伪随机序列中找到的结果来解决,除非有一些我不可能理解的魔法来完成它 :) 例如,也许有一些魔法可以处理类似 5 的倍数-位块或其他东西,我不知道。但是不需要花哨,跳过与正则表达式匹配的结果也可以。
这里我们使用 Base 32(硬编码,但可以是 parts.length
)作为 2 (maxRepeat
) 个最低有效数字和 Base 31(硬编码,但可以是 parts.length-1
) 用于剩余的数字。这给出了长度值的最大范围。
从 0n
到 getMax()
(含)的所有值都可以编码为 31 (minLength
) 个符号。
防止重复长度超过 maxRepeat
的魔法是检查第 i
位与 i - maxRepeat
位,如果出现以下情况,则对第 i
位进行调整>=
。虽然这会产生有效的编码(遵循规则的编码),但并非所有任意符号序列都是有效的,即使它们遵循规则也是如此。例如,序列 mimami
永远不会生成,也无法解码。
const split = new RegExp(`.{2}`, 'g');
const parts = 'mimamonenudidadobebutitetatotukikekakokusisasozezufifafovevuxexu'.match(split);
const partsMap = Object.fromEntries(parts.map((v,i) => ([v,BigInt(i)])));
const encode = (value, maxRepeat = 2, minLength = 31) => {
value = BigInt(value);
const digits = [];
// convert the value to digits
// the first least significant `maxRepeat` digits use base 32, the rest use base 31
while(value > 0) {
const radix = digits.length < maxRepeat ? 32n : 31n;
digits.push(value % radix);
value /= radix;
}
// add 0 padding
while(digits.length < minLength) {
digits.push(0n);
}
// adjust digits to prevent sequences longer than `maxRepeat`
const symbols = []
digits.forEach((v,i) => {
symbols.push((i < maxRepeat || v < symbols[i-maxRepeat]) ? v : v+1n);
});
// map to symbols and return string
const str = symbols.map(v => parts[v]).join('');
return str;
};
const decode = (str, maxRepeat = 2) => {
// split string into array of symbols
const symbols = str.match(split);
// convert symbols to digits
const digits = symbols.map(v => partsMap[v]).map((v,i,a) => {
if(i < maxRepeat || v < a[i-maxRepeat]) return v;
return v-1n;
});
// compute the threshold where we transition from base 31 to base 32
const threshold = digits.length - maxRepeat;
// convert digits to BigInt
const results = digits.reverse().reduce(
(s,v,i) => (s * (i >= threshold ? 32n : 31n) + v)
, 0n);
return results;
};
// compute the maximum value that can be encoded using `minLength` number of symbols
const getMax = (maxRepeat = 2, minLength = 31) => 32n ** BigInt(maxRepeat) * 31n ** BigInt(minLength - maxRepeat) - 1n;
// Consoles will print BigInt but Whosebug's interpreter
// doesn't understand them yet so we use `.toString()`.
console.log('limit:', getMax().toString());
console.log(encode(getMax()));
const n1 = 6038940986212279582529645303138677298679151n;
console.log(encode(n1)); // 'kitefitifomazekosaxubezutatudofotudimidanemadanumasisivebumimi'
const n2 = 0n;
console.log(encode(n2)); // 'mimimamamimimamamimimamamimimamamimimamamimimamamimimamamimima'
const s1 = 'kitefitifomazekosaxubezutatudofotudimidanemadanumasisivebumimi';
console.log(decode(s1).toString()); // 6038940986212279582529645303138677298679151
const s2 = 'mimimamamimimamamimimamamimimamamimimamamimimamamimimamamimima';
console.log(decode(s2).toString()); // 0
console.log(decode(encode(0n)) == 0n);
console.log(decode(encode(6038940986212279582529645303138677298679151n)) == 6038940986212279582529645303138677298679151n);
.as-console-wrapper { max-height: 100% !important; top: 0; }
沿着 parts
是一个包含 32 个 2 字符“符号”的数组):
const parts = `mi
ma
mo
ne
nu
di
da
do
be
bu
ti
te
ta
to
tu
ki
ke
ka
ko
ku
si
sa
so
ze
zu
fi
fa
fo
ve
vu
xe
xu`.trim().split(/\n+/)
const fetch = (x, o) => {
if (x >= o) {
return x
} else {
const v = (x * x) % o
return (x <= (o / 2n)) ? v : o - v
}
}
const fetchLarge = (x) => fetch(x, 41223334444555556666667777777888888889999999997n)
// the last number can be anything.
const buildLarge = (x, o) => fetchLarge((fetchLarge(x) + o) % BigInt(Math.pow(32, 31)) ^ 2030507011013017019023n)
const createArray = (n, mod = 32n) => {
if (!n) return [0];
let arr = [];
while (n) {
arr.push(Number(n % mod));
n /= mod;
}
return arr;
}
const write = (i) => {
const x = buildLarge(i++, 272261127249452727280272961627319532734291n)
return createArray(x).map(x => parts[x]).join('')
}
let i = 1n
while (i < 10000) {
console.log(write(i))
}
我生成的结果如下:
kitekutefaxunetotuzezumatotamabukidimasoxumoxudofasositinu,6038940986212279582529645303138677298679151
sokiketufikefotekakidotetotesamizununetefokixefitetisovene,5431347628569519336817719657935192515363318
xudamituzesimixuxemixudakedatetutununekobuzexesozuxedinenu,5713969289948157459645315228321450728816863
dazenenemovudadikukatatakibekaxexemovubedivusidatafisasine,5082175912370834928186684152014555456835302
xufotidosokabunudomimibefisimakusimokedamomazexekofomokane,4925740069222414438181195472381794794580863
sodozekadakuzemaxetexukuzumisikitazufitizexekatetotuxusone,5182433137814021540565892366585827483507958
kikokasatudatidatufikizesadimatakakatudisibumofotuzutaze,1019165422643074024784461594259815846823503
dakikinetofonexesimavufafisaxefosafisikofotasanekovetevu,1279315636939618596561544978621478602915302
kinunebebuzukokemidatekobusofokikozukobedodakesisikunuki,659622269329577207976266617866288582888591
sozesifamoxebusitotesisasizekudasomitatavudidizukadimate,480714979099063166920265752208932468511478
xumakikofakumixefotisikunumovudafasofikimozenudafosidaka,749508057657951412178315361964670398839871
dazedokutituzufakebutifokekusobuzutemanesadafadatetitamo,103886097260879003150138325027254855900902
xukemizukozefaxetudizukedimotevubesitekitavukakevutisibe,376136321524704717800574424940622855799327
dozexedivenudifabuvedavebukeketozukumasimakuvetuketomafaxe,42948292938975784099596927092482269526555367
mimasatukidisodifikekutovumazefikefonemofimotesonusazexuxe,43196343143305047528500657761292227037320224
zedafimasobukudizedozefoketuzekisadotufikudadokisakedofoxe,43000150124549846140482724444846720574088407
kisafimosotuvuvuzuzukodibevutemidazusisamokososikomofavuma,2692423943832809210699522830552769656612527
soxutokonebusidaketesomoxemibesonubudibekunumatifokokanemo,2942202721014541374299446744441542204274678
xusikematetemititafafakuxusinekefoketonebetokudonesomosama,2312137916289687577537008913213461971911327
我怎样才能使所有字符串的长度都是 31 个“符号”(因为在这个例子中符号是 2 个字符,所以总共有 62 个字符),像这样:
xusikematetemititafafakuxusinekexusikematetemititafafakuxusine
即:算法中上面的3个bigint大数应该是什么?另外,它们应该是什么才能使分布看起来是随机的?我注意到,与较小的数字相比,在边界附近使用较大的数字会产生更好的明显随机化结果。此外,您不能只在 bigint x
前加上 0,这会导致 mamamamamama...
。最后,一个序列中最多可以有 2 对相同的字母,我认为你只能通过跳过不符合该约束的结果来真正解决这个问题(除非有一些数学魔法可以以某种方式判断更多这 32 个“符号”中有两个并排出现)。
关于最后一部分,这些是有效结果:
mamavumamavumama...
nanavumamavuvuma...
这些无效:
mamamavumamavuma...
mavuvuvumamavuma...
因为一排有3对是一样的
总结一下:
- 如何使所有字符串的长度都是 62 个字符,而不用零填充?这意味着它必须适合我不太确定的某个 BigInts 范围。
- 因此分布看起来非常随机(也就是说,我们不仅看到序列的尾端缓慢变化,而且整个数字似乎完全改变,如示例所示)。
- 所以在一个序列中不超过两对是相似的?这部分可以通过跳过我们在伪随机序列中找到的结果来解决,除非有一些我不可能理解的魔法来完成它 :) 例如,也许有一些魔法可以处理类似 5 的倍数-位块或其他东西,我不知道。但是不需要花哨,跳过与正则表达式匹配的结果也可以。
这里我们使用 Base 32(硬编码,但可以是 parts.length
)作为 2 (maxRepeat
) 个最低有效数字和 Base 31(硬编码,但可以是 parts.length-1
) 用于剩余的数字。这给出了长度值的最大范围。
从 0n
到 getMax()
(含)的所有值都可以编码为 31 (minLength
) 个符号。
防止重复长度超过 maxRepeat
的魔法是检查第 i
位与 i - maxRepeat
位,如果出现以下情况,则对第 i
位进行调整>=
。虽然这会产生有效的编码(遵循规则的编码),但并非所有任意符号序列都是有效的,即使它们遵循规则也是如此。例如,序列 mimami
永远不会生成,也无法解码。
const split = new RegExp(`.{2}`, 'g');
const parts = 'mimamonenudidadobebutitetatotukikekakokusisasozezufifafovevuxexu'.match(split);
const partsMap = Object.fromEntries(parts.map((v,i) => ([v,BigInt(i)])));
const encode = (value, maxRepeat = 2, minLength = 31) => {
value = BigInt(value);
const digits = [];
// convert the value to digits
// the first least significant `maxRepeat` digits use base 32, the rest use base 31
while(value > 0) {
const radix = digits.length < maxRepeat ? 32n : 31n;
digits.push(value % radix);
value /= radix;
}
// add 0 padding
while(digits.length < minLength) {
digits.push(0n);
}
// adjust digits to prevent sequences longer than `maxRepeat`
const symbols = []
digits.forEach((v,i) => {
symbols.push((i < maxRepeat || v < symbols[i-maxRepeat]) ? v : v+1n);
});
// map to symbols and return string
const str = symbols.map(v => parts[v]).join('');
return str;
};
const decode = (str, maxRepeat = 2) => {
// split string into array of symbols
const symbols = str.match(split);
// convert symbols to digits
const digits = symbols.map(v => partsMap[v]).map((v,i,a) => {
if(i < maxRepeat || v < a[i-maxRepeat]) return v;
return v-1n;
});
// compute the threshold where we transition from base 31 to base 32
const threshold = digits.length - maxRepeat;
// convert digits to BigInt
const results = digits.reverse().reduce(
(s,v,i) => (s * (i >= threshold ? 32n : 31n) + v)
, 0n);
return results;
};
// compute the maximum value that can be encoded using `minLength` number of symbols
const getMax = (maxRepeat = 2, minLength = 31) => 32n ** BigInt(maxRepeat) * 31n ** BigInt(minLength - maxRepeat) - 1n;
// Consoles will print BigInt but Whosebug's interpreter
// doesn't understand them yet so we use `.toString()`.
console.log('limit:', getMax().toString());
console.log(encode(getMax()));
const n1 = 6038940986212279582529645303138677298679151n;
console.log(encode(n1)); // 'kitefitifomazekosaxubezutatudofotudimidanemadanumasisivebumimi'
const n2 = 0n;
console.log(encode(n2)); // 'mimimamamimimamamimimamamimimamamimimamamimimamamimimamamimima'
const s1 = 'kitefitifomazekosaxubezutatudofotudimidanemadanumasisivebumimi';
console.log(decode(s1).toString()); // 6038940986212279582529645303138677298679151
const s2 = 'mimimamamimimamamimimamamimimamamimimamamimimamamimimamamimima';
console.log(decode(s2).toString()); // 0
console.log(decode(encode(0n)) == 0n);
console.log(decode(encode(6038940986212279582529645303138677298679151n)) == 6038940986212279582529645303138677298679151n);
.as-console-wrapper { max-height: 100% !important; top: 0; }