JavaScript 中的排列子集,每个元素最多重复 6 次
Subset of permutations in JavaScript with up to~6 duplicates per element
我已经尝试 找到字符串数组大小为 K=6 的所有排列,但我要排列的数组太大了(约 13,000 个元素,但我可以保证大部分这些将是重复的),这意味着我得到:
....
re-permuting with 6925/12972
node:internal/console/constructor:257
value: function(streamSymbol, string) {
^
RangeError: Maximum call stack size exceeded
at console.value (node:internal/console/constructor:257:20)
at console.log (node:internal/console/constructor:359:26)
at permute (/home/path/to/code/permutation.js:22:17)
at permute (/home/path/to/code/permutation.js:23:9)
.....
re-permuting with 6924/12972
....
re-permuting with 6918/12972
然后它死了。我猜这是递归的问题。
我知道我的输入中最多有 ~300 个唯一元素(这就是我知道许多数组元素必须重复的原因),但我不知道这是否是一个元素的 10,000 个实例,并且那么其余的是单独的独特元素或每个独特元素至少 K。因此,我不能只将它们放入一个集合中,并且一个元素可能少于 K 个,所以我不能通过将新元素复制 K 次来创建新输入。
这是我对上述链接答案中的代码稍作修改(仅出于可读性和日志记录)的版本(再看一遍,该算法远非最佳):
let permArr = [];
let usedChars = [];
let originalLength;
/**
* Subsets of permutations of an array
* @param {any[]} input array of elements to permute
* @param {number} subsetLength size of the subsets to return
* @returns {Array[]} and array of arrays
*/
function permute(input, subsetLength = input.length) {
let index
let ch;
originalLength ??= input.length;
for (index = 0; index < input.length; index++) {
ch = input.splice(index, 1)[0];
usedChars.push(ch);
if (input.length == 0) {
let toAdd = usedChars.slice(0, subsetLength);
// resizing the returned array to size k
if (!permArr.includes(toAdd)) permArr.push(toAdd);
}
console.log(`re-permuting with ${input.length}/${originalLength}`)
permute(input, subsetLength);
input.splice(index, 0, ch);
usedChars.pop();
}
return permArr
};
我发现这个 but I do not follow it at all, and this other answer 很相似,但仍然使用递归。
如何在没有 recursion/more 性能的情况下使其能够处理更大的数组?我正在使用 NodeJs,并且我不反对使用不同的数据类型。
I don't know if that's 10,000 instances of one element, and then the rest are individually unique elements or at least K of each unique element. Because of that I can't just fed them into a Set, and there maybe fewer than K of one element so I can't just create a new input by duplicating the new ones K times
所以只需将它们分组并计数。看起来很简单:
function subsetPermutations(arr, size) {
const counts = {};
for (const el of arr) {
counts[el] = (counts[el] ?? 0) + 1;
}
const unique = Object.keys(counts);
const result = Array.from({length: size});
function* recurse(depth) {
if (depth == size) {
yield result;
} else {
for (const el of unique) {
if (counts[el]) {
result[depth] = el;
counts[el]--;
yield* recurse(depth+1)
counts[el]++;
}
}
}
}
return recurse(0);
}
for (const perm of subsetPermutations(["a", "b", "b", "c", "c", "c"], 3)) {
console.log(perm.join('-'));
}
I have tried to this answer to find all permutations of size K=6, but the array I'm permuting is way too large (~13,000 elements), however I can guarantee I know that there's at most ~300 unique
这仍然是粗略的 3006 个排列,这太多了,无法将它们放入一个数组中。上面的函数被设计成一个迭代器,这样你就可以在当前 result
在下一次迭代中发生变异之前循环处理它,以避免任何分配开销,但它仍然需要很长时间才能生成所有他们。
How can I make this without recursion/more performant so it can handle much larger arrays? I'm using NodeJs, and I'm not averse to using different data types.
您可以使用 Map
而不是 counts
的对象,但我怀疑仅 300 个不同的元素会更快。
避免递归是不必要的,因为它只有 6 层深,不会像您的低效解决方案那样出现堆栈溢出。但为了性能,您可能仍会尝试 this approach 动态生成嵌套循环。
我已经尝试
.... re-permuting with 6925/12972 node:internal/console/constructor:257 value: function(streamSymbol, string) { ^ RangeError: Maximum call stack size exceeded at console.value (node:internal/console/constructor:257:20) at console.log (node:internal/console/constructor:359:26) at permute (/home/path/to/code/permutation.js:22:17) at permute (/home/path/to/code/permutation.js:23:9) ..... re-permuting with 6924/12972 .... re-permuting with 6918/12972
然后它死了。我猜这是递归的问题。
我知道我的输入中最多有 ~300 个唯一元素(这就是我知道许多数组元素必须重复的原因),但我不知道这是否是一个元素的 10,000 个实例,并且那么其余的是单独的独特元素或每个独特元素至少 K。因此,我不能只将它们放入一个集合中,并且一个元素可能少于 K 个,所以我不能通过将新元素复制 K 次来创建新输入。
这是我对上述链接答案中的代码稍作修改(仅出于可读性和日志记录)的版本(再看一遍,该算法远非最佳):
let permArr = [];
let usedChars = [];
let originalLength;
/**
* Subsets of permutations of an array
* @param {any[]} input array of elements to permute
* @param {number} subsetLength size of the subsets to return
* @returns {Array[]} and array of arrays
*/
function permute(input, subsetLength = input.length) {
let index
let ch;
originalLength ??= input.length;
for (index = 0; index < input.length; index++) {
ch = input.splice(index, 1)[0];
usedChars.push(ch);
if (input.length == 0) {
let toAdd = usedChars.slice(0, subsetLength);
// resizing the returned array to size k
if (!permArr.includes(toAdd)) permArr.push(toAdd);
}
console.log(`re-permuting with ${input.length}/${originalLength}`)
permute(input, subsetLength);
input.splice(index, 0, ch);
usedChars.pop();
}
return permArr
};
我发现这个
如何在没有 recursion/more 性能的情况下使其能够处理更大的数组?我正在使用 NodeJs,并且我不反对使用不同的数据类型。
I don't know if that's 10,000 instances of one element, and then the rest are individually unique elements or at least K of each unique element. Because of that I can't just fed them into a Set, and there maybe fewer than K of one element so I can't just create a new input by duplicating the new ones K times
所以只需将它们分组并计数。看起来很简单:
function subsetPermutations(arr, size) {
const counts = {};
for (const el of arr) {
counts[el] = (counts[el] ?? 0) + 1;
}
const unique = Object.keys(counts);
const result = Array.from({length: size});
function* recurse(depth) {
if (depth == size) {
yield result;
} else {
for (const el of unique) {
if (counts[el]) {
result[depth] = el;
counts[el]--;
yield* recurse(depth+1)
counts[el]++;
}
}
}
}
return recurse(0);
}
for (const perm of subsetPermutations(["a", "b", "b", "c", "c", "c"], 3)) {
console.log(perm.join('-'));
}
I have tried to this answer to find all permutations of size K=6, but the array I'm permuting is way too large (~13,000 elements), however I can guarantee I know that there's at most ~300 unique
这仍然是粗略的 3006 个排列,这太多了,无法将它们放入一个数组中。上面的函数被设计成一个迭代器,这样你就可以在当前 result
在下一次迭代中发生变异之前循环处理它,以避免任何分配开销,但它仍然需要很长时间才能生成所有他们。
How can I make this without recursion/more performant so it can handle much larger arrays? I'm using NodeJs, and I'm not averse to using different data types.
您可以使用 Map
而不是 counts
的对象,但我怀疑仅 300 个不同的元素会更快。
避免递归是不必要的,因为它只有 6 层深,不会像您的低效解决方案那样出现堆栈溢出。但为了性能,您可能仍会尝试 this approach 动态生成嵌套循环。