求集合中所有连续子串之和的算法

Question

我正在尝试编写一种节省时间的算法，该算法可以找到数组的所有可能连续子串的总和（保留顺序和组合可以是任意长度）

例如：

[1,2,3,4] -> 1 + 2 + 3 + 4 + 12 + 23 + 34 + 123 + 234 + 1234 = 1670

同样重要的是要注意数组可以重复多次

到目前为止我最好的尝试可能是这样的：（n 是一组数字）

k = 3 // number of times the array repeats
length = len(n)
total = 0

for i in range(0, length*k):
    for exp in range(0, length*k-i): 
    //iterate though all of the possible powers of ten a certain number could be in
    // ie. all the different places that number could be in for all combinations 

        total += ((n[i % length] * 10**exp) * (i + 1))
        // ^ turns number from standard from into int. The i + 1 account for
        // the fact the number could be in the same position in more than one combination

return total

但是，对于其中包含超过 10^20 个数字的数组，此算法必须运行，因此我正在寻找更快的算法。

注意所有数字都是个位数，数字可以重复

Answer 1

请注意，您可以：
1) 每 i 次迭代找到 n[i] * (i + 1) 一次
2) 找到 1+10 + 100 + ...10^(length-i-1) 的总和作为等差数列的总和

  s = 10^(length-i) - 1 / 9
  9999..99/9=1111..11

所以你可以获得 O(n) 的复杂度。

3) 更多优化 - 使 1111... 乘法器在每次 i 迭代中具有单个操作（整数将初始值除以 10 或使 m*10+1 用于反向循环）

Answer 2

我们可以通过记下包含该元素的子数组的可能开始（左）和结束（右）位置的数量来计算任何给定元素在任何给定 10 次方出现的次数。

起始位置的数量就是左边元素的数量（+1），结束位置的数量就是右边元素的数量（+1）。例如，在[4,5,6,7]中包含6的子数组将有3个起始位置和2个结束位置：

s s s e e
↓ ↓ ↓ ↓ ↓
[4,5,6,7]

起始位置不会影响出现在 - 456、56 和 6 处的元素的 10 次方显示包含元素 [=14= 的子数组的 3 个不同起始位置]，但对于所有这些 6 都是 10⁰。可能的起始位置的数量将直接乘以一个元素在一个位置出现的频率（3 个起始位置 -> 每个位置可以出现 3 次）。

结束位置影响元素出现的 10 次方，但不影响它出现的次数：5、56 和 567 显示子数组的 3 个结束位置包含元素 5。 5出现在10⁰、10¹和10²，每次只出现一次.我们可以用 5*111.

来总结

将这两件事放在一起，任何元素对总和的影响是：

element * start positions * 111...(end positions times)...11

如上所述，start positions 是 1 + 当前索引（从 0 开始的数组）。并且 end positions 当我们从数组的左侧移动到右侧时减 1（或当我们从右向左移动时增加 1），因此，对于上面最右边的项，我们可以从对 1 乘以 10 并重复加 1。

这导致一些相当简单的 (Java) 代码：

int[] array = {1,2,3,4};
int sum = 0;
int endMultiplier = 0;
for (int i = array.length-1; i >= 0; i--)
{
    endMultiplier = 10*endMultiplier + 1;
    sum += array[i] * (i+1) * endMultiplier;
}
System.out.println(sum); // prints 1670

Live demo.

如果元素可以是多位数字，上面的方法可以概括为，而不是将endMultiplier乘以10，我们根据当前元素（长度 1 为 10，长度 2 为 100，等等）。

int endMultiplier = 1;
int sum = 0;
for (int i = array.length-1; i >= 0; i--)
{
    sum += array[i] * (i+1) * endMultiplier;
    // Direct method of calculating k (with floating points):
    // int k = array[i] == 0 ? 10 : (int)Math.pow(10, 1+(int)Math.log10(array[i]));
    int k = 10;
    for (; k < array[i]; k *= 10)
        {}
    endMultiplier = k*endMultiplier + 1;
}

Live demo.

Answer 3

我终于在 O(n) 中找到了 generalized version，它适用于任意长度的正数：

document.body.innerHTML = solve([1,2,3,4]) + " = 1670<br>";
document.body.innerHTML += solve([22,101,3]) + " = " + (22+101+3+22101+1013+221013);

function solve(arr) {
    let n = arr.length, total = 0, sum = 0;
    for (let i = n - 1; i >= 0; i--) {
        total += arr[i] * (i + 1) * (sum + 1);
        let f = Math.pow(10, len(arr[i]));
        sum = f + sum * f;
    }
    return total;
}

function len(x) {
    if (x < 10)
        return 1;
    return Math.floor(Math.log10(x)) + 1;
}

我通过写出 n-1 到 n-3 的等式得出公式：

n-1: a[n-1] * n
n-2: a[n-2] * (n-1) + a[n-2] * (n-1) * 10^len(a[n-1])
n-3: a[n-3] * (n-2) + a[n-3] * (n-2) * (10^len(a[n-2]) + 10^(len(a[n-1]) + len(a[n-2])))

第一项表示数字出现在最后位置的次数。我们将 10^... 部分的总和表示为 sum 并分解出 a[i] * (i+1)，这就是 +1 来自 *(sum+1) 的地方。下一次迭代的 sum 是 10^len(a[i])+10^len(a[i])*sum。这是因为
10^a + 10^a * (10^b + 10^(b+c) + ...) = 10^a + 10^(a+b) + 10^(a+b+c) + ....

求集合中所有连续子串之和的算法

Algorithm to find the sum of all contiguous substrings of a set

algorithm

math

numbers

mathematical-optimization

discrete-mathematics