用于查找一组数字中重复出现的数字集的正则表达式
Regex to find reoccurring number sets in a set of a numbers
给定一组数字,正则表达式是否有可能找到长度为 N 多次出现的数字子集,最好是在循环变量 N 上。我目前有一些东西可以找到不止一次出现,但是这个 return噪音太大了。我希望它在循环中找到长度为 N 的集合,将 N 从大集合递减到小集合。
看似任意的数字序列是一个转换为数字字符串的字节数组,我要捕获的集合是异或编码文件的可能键。
鉴于编码文本足够长,有时可能会用长度为 N 的密钥对 N 个空格进行异或运算,从而以大致明文形式再现密钥。我已经对此进行了测试,例如:
" " ^ "ThisIsTheKey" produces roughly "tHISiStHEkEY"
当前正则表达式(java 引擎):
String regex = "(\d+)\1";
Pattern patt = Pattern.compile(regex);
Matcher matcher = patt.matcher(sToDecode);
while (matcher.find())
{
System.out.println("Repeated substring: " + matcher.group(1));
}
鉴于:
737568797372696810068791021116868686873696868657376791001117268681067368686868736865736810169686872687972686568689876796869726874749911010194687265796810111086696511099688368688369868984896876708580849586987885681111109978697865767372737668676968796870797899110101110107736868726569697978736868657394707570661101011101079878991101101026968736879686572100736868766968736879686572100736867681107968657210073686876696873687968657210073686876696873687968101110107981007368687669687368796865721007368687669681006872689968796865721007368687669687368796865721007368687673666910772100736868766968736879686572100736868766810011073687968657210073686876696873687767696868711109911010168657210073686876696873687968657210073686876696873687968657210073681111107368796865721007368687669687368796865721007368687669687299110101686572100736868766968736879686572100681056899687968657210073686876696873687968657210073686876696873687310111010772100736868766968736879686572100736868766968737368102111110736879686572100...
这将找到以下重复出现的子集:
...
Repeated substring: 736879686572100736868766968
Repeated substring: 1
Repeated substring: 0
Repeated substring: 68
Repeated substring: 6
Repeated substring: 0
Repeated substring: 68
Repeated substring: 686572100736868766968736879
Repeated substring: 1
Repeated substring: 657210073686876696873687968
...
如果可以更改正则表达式,请告诉我它只会 return:
Repeated substring: 736879686572100736868766968
Repeated substring: 686572100736868766968736879
Repeated substring: 657210073686876696873687968
使用 +
将匹配一个到多个数字,这就是为什么您会得到所有这些短子字符串的原因。如果要对长度添加约束,只需将其更改为 {n,m}
其中 0<=n<m
(其中一个可以为空)。
要获取 3 个及更多重复数字的组,请使用:
(\d{3,})
给定一组数字,正则表达式是否有可能找到长度为 N 多次出现的数字子集,最好是在循环变量 N 上。我目前有一些东西可以找到不止一次出现,但是这个 return噪音太大了。我希望它在循环中找到长度为 N 的集合,将 N 从大集合递减到小集合。
看似任意的数字序列是一个转换为数字字符串的字节数组,我要捕获的集合是异或编码文件的可能键。
鉴于编码文本足够长,有时可能会用长度为 N 的密钥对 N 个空格进行异或运算,从而以大致明文形式再现密钥。我已经对此进行了测试,例如:
" " ^ "ThisIsTheKey" produces roughly "tHISiStHEkEY"
当前正则表达式(java 引擎):
String regex = "(\d+)\1";
Pattern patt = Pattern.compile(regex);
Matcher matcher = patt.matcher(sToDecode);
while (matcher.find())
{
System.out.println("Repeated substring: " + matcher.group(1));
}
鉴于: 737568797372696810068791021116868686873696868657376791001117268681067368686868736865736810169686872687972686568689876796869726874749911010194687265796810111086696511099688368688369868984896876708580849586987885681111109978697865767372737668676968796870797899110101110107736868726569697978736868657394707570661101011101079878991101101026968736879686572100736868766968736879686572100736867681107968657210073686876696873687968657210073686876696873687968101110107981007368687669687368796865721007368687669681006872689968796865721007368687669687368796865721007368687673666910772100736868766968736879686572100736868766810011073687968657210073686876696873687767696868711109911010168657210073686876696873687968657210073686876696873687968657210073681111107368796865721007368687669687368796865721007368687669687299110101686572100736868766968736879686572100681056899687968657210073686876696873687968657210073686876696873687310111010772100736868766968736879686572100736868766968737368102111110736879686572100...
这将找到以下重复出现的子集:
...
Repeated substring: 736879686572100736868766968
Repeated substring: 1
Repeated substring: 0
Repeated substring: 68
Repeated substring: 6
Repeated substring: 0
Repeated substring: 68
Repeated substring: 686572100736868766968736879
Repeated substring: 1
Repeated substring: 657210073686876696873687968
...
如果可以更改正则表达式,请告诉我它只会 return:
Repeated substring: 736879686572100736868766968
Repeated substring: 686572100736868766968736879
Repeated substring: 657210073686876696873687968
使用 +
将匹配一个到多个数字,这就是为什么您会得到所有这些短子字符串的原因。如果要对长度添加约束,只需将其更改为 {n,m}
其中 0<=n<m
(其中一个可以为空)。
要获取 3 个及更多重复数字的组,请使用:
(\d{3,})