找到重复一次生成给定序列的子序列

Question

给定一个大小为 N 的序列，这是一个未知子序列的重复，你如何有效地找到子序列的大小 M？

例如：

input : 6651366513665136651366513 -> output : sequence of length 5 which is 66513
input : 11111111111111111111111111111 -> output : sequence of length 1 which is 1
input : 6651366513665136651366513665 -> output : sequence of length 5 which is 66513

序列的元素是正数，而不仅仅是数字。
N 不是 M 的倍数，因为最后一个序列不必是完整的。例如 665 可以附加到第一个示例。

天真的方法是：

assume the sub-sequence is of size x, test, if not correct increase x and try again or output x

我仍在设计另一种解决方案，它没有上述 O(N^2) 时间复杂度。

注意： 出于好奇，我正在解析需要从流分析中构建索引的媒体文件，我发现索引遵循重复模式。而不是解析 2h 个文件，我可能会解析一分钟并猜测下一个 1h59m 的索引。

Answer 1

给定一个序列 S，要计算周期的长度，您只需找到 S+S 中第二次出现的 S。例如：

正在搜索

6651366513665136651366513

在

66513665136651366513665136651366513665136651366513

表示序列第二次出现在索引5中。鉴于原始序列的长度为 25，您可以看到它重复了 5 次。

您可以使用任何您想要的子字符串搜索算法，例如KMP 保证 O(n) 复杂度。

Answer 2

所以您的想法是从最小子序列 = 1 和该子序列中的当前索引 = 0 开始。然后您开始比较字符串中的每个字符。如果当前字符与当前最小子序列内的索引匹配，则增加当前索引子序列（% 是在它到达当前子序列大小的末尾后将其重置回零）。如果它们不匹配，则将 window 大小设置为当前索引 + 1 并将 window 中的当前索引重置回 0，然后重新开始此过程。这在 O(N) 中运行。

    public void getMinSubsequenceLength(String s){
       int currentMinSubsequence=1;
       int currentIndexInSubsequence = 0;
       for(int i=1;i<s.length();i++){
           if(s.charAt(i)!=s.charAt(currentIndexInSubsequence)){
               currentMinSubsequence = i+1;
               currentIndexInSubsequence = 0;
           } else {
               currentIndexInSubsequence = (currentIndexInSubsequence+1)%currentMinSubsequence;
           }
      }
       System.out.println(currentMinSubsequence);
}

Answer 3

Niklas B 建议的 Z algorithm 是我为我的问题找到的最佳匹配。
确实定义为：

Zi(P) = the length of the longest substring of P that starts at i > 0 and matches a prefix of P.

给定一个 z 算法，子序列的长度是索引 k 满足（如果有的话）：

z[k] = n - k
z[k] = max(z[i])

为输入

std::vector<int> v = { 6, 6, 5, 1, 3, 6, 6, 5, 1, 3, 6, 6, 5, 1, 3, 6, 6, 5 };

z 索引是

std::vector<int> z = { 0, 1, 0, 0, 0,13, 1, 0, 0, 0, 8, 1, 0, 0, 0, 3, 1, 0 };

和k = 5

找到重复一次生成给定序列的子序列

Find the subsequence which, once repeated, generate a given sequence

algorithm

big-o

data-structures