JavaScript 中的排列子集,每个元素最多重复 6 次

Subset of permutations in JavaScript with up to~6 duplicates per element

我已经尝试 找到字符串数组大小为 K=6 的所有排列,但我要排列的数组太大了(约 13,000 个元素,但我可以保证大部分这些将是重复的),这意味着我得到:

....
re-permuting with 6925/12972
node:internal/console/constructor:257
   value: function(streamSymbol, string) {
                   ^
RangeError: Maximum call stack size exceeded  
    at console.value (node:internal/console/constructor:257:20)  
    at console.log (node:internal/console/constructor:359:26)  
    at permute (/home/path/to/code/permutation.js:22:17)  
    at permute (/home/path/to/code/permutation.js:23:9)  
    .....
re-permuting with 6924/12972
....
re-permuting with 6918/12972

然后它死了。我猜这是递归的问题。

我知道我的输入中最多有 ~300 个唯一元素(这就是我知道许多数组元素必须重复的原因),但我不知道这是否是一个元素的 10,000 个实例,并且那么其余的是单独的独特元素或每个独特元素至少 K。因此,我不能只将它们放入一个集合中,并且一个元素可能少于 K 个,所以我不能通过将新元素复制 K 次来创建新输入。

这是我对上述链接答案中的代码稍作修改(仅出于可读性和日志记录)的版本(再看一遍,该算法远非最佳):

let permArr = [];
let usedChars = [];
let originalLength;
/**
 * Subsets of permutations of an array
 * @param {any[]} input array of elements to permute
 * @param {number} subsetLength size of the subsets to return
 * @returns {Array[]} and array of arrays
 */
function permute(input, subsetLength = input.length) {
    let index
    let ch;
    originalLength ??= input.length;
    for (index = 0; index < input.length; index++) {
        ch = input.splice(index, 1)[0];
        usedChars.push(ch);
        if (input.length == 0) {
            let toAdd = usedChars.slice(0, subsetLength);
            // resizing the returned array to size k
            if (!permArr.includes(toAdd)) permArr.push(toAdd);
        }
        console.log(`re-permuting with ${input.length}/${originalLength}`)
        permute(input, subsetLength);
        input.splice(index, 0, ch);
        usedChars.pop();
    }
    return permArr
};

我发现这个 but I do not follow it at all, and this other answer 很相似,但仍然使用递归。

如何在没有 recursion/more 性能的情况下使其能够处理更大的数组?我正在使用 NodeJs,并且我不反对使用不同的数据类型。

I don't know if that's 10,000 instances of one element, and then the rest are individually unique elements or at least K of each unique element. Because of that I can't just fed them into a Set, and there maybe fewer than K of one element so I can't just create a new input by duplicating the new ones K times

所以只需将它们分组并计数。看起来很简单:

function subsetPermutations(arr, size) {
    const counts = {};
    for (const el of arr) {
        counts[el] = (counts[el] ?? 0) + 1;
    }
    const unique = Object.keys(counts);
    const result = Array.from({length: size});
    function* recurse(depth) {
        if (depth == size) {
            yield result;
        } else {
            for (const el of unique) {
                if (counts[el]) {
                    result[depth] = el;
                    counts[el]--;
                    yield* recurse(depth+1)
                    counts[el]++;
                }
            }
        }
    }
    return recurse(0);
}

for (const perm of subsetPermutations(["a", "b", "b", "c", "c", "c"], 3)) {
    console.log(perm.join('-'));
}

I have tried to this answer to find all permutations of size K=6, but the array I'm permuting is way too large (~13,000 elements), however I can guarantee I know that there's at most ~300 unique

这仍然是粗略的 3006 个排列,这太多了,无法将它们放入一个数组中。上面的函数被设计成一个迭代器,这样你就可以在当前 result 在下一次迭代中发生变异之前循环处理它,以避免任何分配开销,但它仍然需要很长时间才能生成所有他们。

How can I make this without recursion/more performant so it can handle much larger arrays? I'm using NodeJs, and I'm not averse to using different data types.

您可以使用 Map 而不是 counts 的对象,但我怀疑仅 300 个不同的元素会更快。

避免递归是不必要的,因为它只有 6 层深,不会像您的低效解决方案那样出现堆栈溢出。但为了性能,您可能仍会尝试 this approach 动态生成嵌套循环。