Rabin Karp 算法运行速度比 naive 慢
Rabin Karp algorithm runs slower than naive
我实现了 Rabin-Karp 算法,但它 运行 比简单的字符串搜索慢(这不是作业,我只是对其进行编码以更好地理解它)。
Rabin Karp(我使用 128 作为 ascii 的基础):
public long honer(String s){
long value = (int)s.charAt(0);
for(int i = 0; i < s.length() - 1; i++){
value = value*baseASCII + (int)s.charAt(i + 1);
}
return value;
}
public long updateHoner(Character c0, long honerNumber, Character c4, int length){
long newNumber = (honerNumber - (int)c0*(long)Math.pow(baseASCII, length))*baseASCII + (int)c4;
return newNumber;
}
public void rabinKarp(String subString){
long subStringValue = honer(subString);
long sValue = honer(string.substring(0, subString.length()));
if(subStringValue == sValue){
//p.println("match found at index 0");
}
for(int i = 1; i <= string.length() - subString.length(); i++){
sValue = updateHoner(string.charAt(i-1), sValue, string.charAt(i+subString.length()-1), subString.length() - 1);
if(sValue == subStringValue){
//p.println("match found at index " + i);
}
}
}
朴素算法:
public void naiveFinder(String subString){
String found = "";
for(int i = 0; i <= string.length() - subString.length(); i++){
for(int j = 0; j < subString.length(); j++ ){
if(string.charAt(i+j) == subString.charAt(j)){
found += string.charAt(i+j);
}
else{
// reset the string and continue with the next char
found = "";
break;
}
if(found.length() == subString.length()){
found = "";
//p.println(subString + " found at index " + (i));
}
}
}
}
测试(主要):
static void main(){
long init, after;
StringFinder sf = new StringFinder(getString());
init = Calendar.getInstance().getTimeInMillis();
for(int i = 0; i < 10000; i++)
sf.rabinKarp("roll");
after = Calendar.getInstance().getTimeInMillis();
p.println("rabin karp without mod : " + (after - init));
init = Calendar.getInstance().getTimeInMillis();
for(int i = 0; i < 10000; i++)
sf.naiveFinder("roll");
after = Calendar.getInstance().getTimeInMillis();
p.println("naive : " + (after - init));
}
结果是 运行 Rabin-Karp 算法用了 2517 毫秒,而 运行 朴素算法只用了 675 毫秒。我怀疑是 updateHoner()
花了很长时间,但我不确定。任何见解将不胜感激。
谢谢
编辑:拼写错误
在updateHoner()
中:
- 不要重新计算
(long)Math.pow(baseASCII, length))*baseASCII
- 这个值不会改变。计算一次。
- 将参数从
Character
更改为 char
以避免额外的装箱和拆箱
我实现了 Rabin-Karp 算法,但它 运行 比简单的字符串搜索慢(这不是作业,我只是对其进行编码以更好地理解它)。
Rabin Karp(我使用 128 作为 ascii 的基础):
public long honer(String s){
long value = (int)s.charAt(0);
for(int i = 0; i < s.length() - 1; i++){
value = value*baseASCII + (int)s.charAt(i + 1);
}
return value;
}
public long updateHoner(Character c0, long honerNumber, Character c4, int length){
long newNumber = (honerNumber - (int)c0*(long)Math.pow(baseASCII, length))*baseASCII + (int)c4;
return newNumber;
}
public void rabinKarp(String subString){
long subStringValue = honer(subString);
long sValue = honer(string.substring(0, subString.length()));
if(subStringValue == sValue){
//p.println("match found at index 0");
}
for(int i = 1; i <= string.length() - subString.length(); i++){
sValue = updateHoner(string.charAt(i-1), sValue, string.charAt(i+subString.length()-1), subString.length() - 1);
if(sValue == subStringValue){
//p.println("match found at index " + i);
}
}
}
朴素算法:
public void naiveFinder(String subString){
String found = "";
for(int i = 0; i <= string.length() - subString.length(); i++){
for(int j = 0; j < subString.length(); j++ ){
if(string.charAt(i+j) == subString.charAt(j)){
found += string.charAt(i+j);
}
else{
// reset the string and continue with the next char
found = "";
break;
}
if(found.length() == subString.length()){
found = "";
//p.println(subString + " found at index " + (i));
}
}
}
}
测试(主要):
static void main(){
long init, after;
StringFinder sf = new StringFinder(getString());
init = Calendar.getInstance().getTimeInMillis();
for(int i = 0; i < 10000; i++)
sf.rabinKarp("roll");
after = Calendar.getInstance().getTimeInMillis();
p.println("rabin karp without mod : " + (after - init));
init = Calendar.getInstance().getTimeInMillis();
for(int i = 0; i < 10000; i++)
sf.naiveFinder("roll");
after = Calendar.getInstance().getTimeInMillis();
p.println("naive : " + (after - init));
}
结果是 运行 Rabin-Karp 算法用了 2517 毫秒,而 运行 朴素算法只用了 675 毫秒。我怀疑是 updateHoner()
花了很长时间,但我不确定。任何见解将不胜感激。
谢谢
编辑:拼写错误
在updateHoner()
中:
- 不要重新计算
(long)Math.pow(baseASCII, length))*baseASCII
- 这个值不会改变。计算一次。 - 将参数从
Character
更改为char
以避免额外的装箱和拆箱