将 recurring/duplicate 模式识别为来自父数组的子数组
identify recurring/duplicate patterns as sub-arrays from a parent array
我有一个典型的模式搜索问题,我需要确定多个模式在数组中出现的位置并将它们挑出来。
例如:['horse', 'camel', 'horse', 'camel', 'tiger', 'horse', 'camel', 'horse', 'camel']
函数应该return
['horse', 'camel'],
['horse', 'camel', 'horse'],
['camel', 'horse', 'camel'],
['horse', 'camel', 'horse', 'camel']
即查找可以成为子数组的数组中重复的模式,
或者另一种定义方式是 -> 查找所有在主数组中出现超过 1 次的子数组。
即结果数组应该有 length > 1
->
[1, 2, 3, 1, 2, 1, 4, 5]
=> [1,2,3]
和 [1,4,5]
都是子数组但是 [1,2,3]
是 recurring/repeating 子数组 NOT [1,4,5]
寻找合适的高效算法而不是强力循环解决方案。
这可能不是您想要的,但我不知道您已经尝试过什么,所以它可能会有用。这是我的直接方法,可能属于您的 "brute-force looping solutions" 但我想试一试,因为没有人发布完整的答案。
在java中:
// use this to not add duplicates to list
static boolean contains (List<String[]> patterns, String[] pattern){
for(String[] s: patterns)
if (Arrays.equals(pattern,s)) return true;
return false;
}
/**
*
* @param str String array containing all elements in your set
* @param start index of subarray
* @param end index of subarray
* @return if subarray is a recurring pattern
*/
static boolean search (String[] str,int start,int end) {
// length of pattern
int len = end - start + 1;
// how many times you want pattern to
// appear in text
int n = 1;
// increment m if pattern is matched
int m = 0;
// shift pattern down the array
for (int i = end+1; i <= str.length - len; i++) {
int j;
for (j = 0; j < len; j++) {
if (!str[i + j].equals(str[start + j]))
break;
}
// if pattern is matched at [i to i+len]
if (j == len) {
m++;
if (m == n) return true;
}
}
return false;
}
/**
*
* @param str String array containing all elements in your set
* @return a list of subsets of input set which are a recurring pattern
*/
static List<String[]> g (String[] str) {
// put patterns in here
List<String[]> patterns = new ArrayList<>();
// iterate through all possible subarrays in str
for(int i = 0; i < str.length-1; i++){
for(int j = i + 1; j < str.length; j++){
// if a pattern is found
if (search(str,i,j)) {
int len = j-i+1;
String[] subarray = new String[len];
System.arraycopy(str,i,subarray,0,len);
if (!contains(patterns,subarray))
patterns.add(subarray);
}
}
}
return patterns;
}
public static void main(String[] args) {
String[] str = {"horse", "camel", "horse", "camel", "tiger",
"horse", "camel", "horse", "camel"};
// print out
List<String[]> patterns = g(str);
for (String[] s: patterns)
System.out.println(Arrays.toString(s));
}
输出:
[horse, camel]
[horse, camel, horse]
[horse, camel, horse, camel]
[camel, horse]
[camel, horse, camel]
正如我发表的评论中提到的:
"would [camel, horse]
be included in the output?"
我的输出与此一致,因为在索引 [1-2]
和 [6-7]
处有 2 个 [camel, horse]
实例。但也许我完全误解了你的问题并且我不理解这些限制。
至于优化,例如 search(...)
方法只是一个简单的子字符串搜索,还有一些更优化的方法可以做到这一点,例如Knuth–Morris–Pratt。抱歉,如果这正是您不想要的,但也许有一些用处
我有一个典型的模式搜索问题,我需要确定多个模式在数组中出现的位置并将它们挑出来。
例如:['horse', 'camel', 'horse', 'camel', 'tiger', 'horse', 'camel', 'horse', 'camel']
函数应该return
['horse', 'camel'],
['horse', 'camel', 'horse'],
['camel', 'horse', 'camel'],
['horse', 'camel', 'horse', 'camel']
即查找可以成为子数组的数组中重复的模式,
或者另一种定义方式是 -> 查找所有在主数组中出现超过 1 次的子数组。
即结果数组应该有 length > 1
->
[1, 2, 3, 1, 2, 1, 4, 5]
=> [1,2,3]
和 [1,4,5]
都是子数组但是 [1,2,3]
是 recurring/repeating 子数组 NOT [1,4,5]
寻找合适的高效算法而不是强力循环解决方案。
这可能不是您想要的,但我不知道您已经尝试过什么,所以它可能会有用。这是我的直接方法,可能属于您的 "brute-force looping solutions" 但我想试一试,因为没有人发布完整的答案。
在java中:
// use this to not add duplicates to list
static boolean contains (List<String[]> patterns, String[] pattern){
for(String[] s: patterns)
if (Arrays.equals(pattern,s)) return true;
return false;
}
/**
*
* @param str String array containing all elements in your set
* @param start index of subarray
* @param end index of subarray
* @return if subarray is a recurring pattern
*/
static boolean search (String[] str,int start,int end) {
// length of pattern
int len = end - start + 1;
// how many times you want pattern to
// appear in text
int n = 1;
// increment m if pattern is matched
int m = 0;
// shift pattern down the array
for (int i = end+1; i <= str.length - len; i++) {
int j;
for (j = 0; j < len; j++) {
if (!str[i + j].equals(str[start + j]))
break;
}
// if pattern is matched at [i to i+len]
if (j == len) {
m++;
if (m == n) return true;
}
}
return false;
}
/**
*
* @param str String array containing all elements in your set
* @return a list of subsets of input set which are a recurring pattern
*/
static List<String[]> g (String[] str) {
// put patterns in here
List<String[]> patterns = new ArrayList<>();
// iterate through all possible subarrays in str
for(int i = 0; i < str.length-1; i++){
for(int j = i + 1; j < str.length; j++){
// if a pattern is found
if (search(str,i,j)) {
int len = j-i+1;
String[] subarray = new String[len];
System.arraycopy(str,i,subarray,0,len);
if (!contains(patterns,subarray))
patterns.add(subarray);
}
}
}
return patterns;
}
public static void main(String[] args) {
String[] str = {"horse", "camel", "horse", "camel", "tiger",
"horse", "camel", "horse", "camel"};
// print out
List<String[]> patterns = g(str);
for (String[] s: patterns)
System.out.println(Arrays.toString(s));
}
输出:
[horse, camel]
[horse, camel, horse]
[horse, camel, horse, camel]
[camel, horse]
[camel, horse, camel]
正如我发表的评论中提到的:
"would [camel, horse]
be included in the output?"
我的输出与此一致,因为在索引 [1-2]
和 [6-7]
处有 2 个 [camel, horse]
实例。但也许我完全误解了你的问题并且我不理解这些限制。
至于优化,例如 search(...)
方法只是一个简单的子字符串搜索,还有一些更优化的方法可以做到这一点,例如Knuth–Morris–Pratt。抱歉,如果这正是您不想要的,但也许有一些用处