拼写检查的递归二进制搜索缺少一些单词
Recursive Binary Search for Spellcheck is missing some words
我有一个包含单词列表的字典文件,我将该文件读入数组列表 sArray
。然后我有一本书,其中我使用字符串解析器从这本书中获取每个字符串并将其发送到二进制搜索方法 bSearch
。 bSearch
将使用递归二进制搜索来确定是否在包含字典的数组 sArray
中找到键。如果找不到该词,它将打印出该词可能拼写错误。
我的问题是,我正在获取字典数组中确实存在的单词输出。我已经确认单词被正确阅读,所以问题归结为使用 bSearch
导航 sArray
。我不确定代码有什么问题。下面列出了一些误报示例。
这是link我字典的粘贴转储;您应该能够在下面搜索这些词并找到它们。 https://paste.ee/p/wp3qh
示例输出:
结果输出仍然是所有误报
ebracteate is possibly mispelled
Phaca is possibly mispelled
holmberry is possibly mispelled
sraddha is possibly mispelled
public class Program2 {
private int mid;
public Program2() {
mid = 0;
}
public static void main(String[] args) throws FileNotFoundException, IOException {
File inf = new File("dictonary.txt");
ArrayList<String> sArray = new ArrayList<>();
Program2 a = new Program2();
a.readDictonary(sArray);
Collections.sort(sArray, String.CASE_INSENSITIVE_ORDER);
int correctRec = 0;
int incorrectRec = 0;
int correctW = 0;
int incorrectW = 0;
FileInputStream infO = new FileInputStream(new File("oliver.txt"));
char let;
String str = "";
int n = 0;
while ((n = infO.read()) != -1) {
let = (char) n;
if (Character.isLetter(let)) {
str += Character.toLowerCase(let);
}
if ((Character.isWhitespace(let) || let == '-') && !str.isEmpty()) {
// Write code to insert str in to your tree here
if (a.bSearch(sArray, str, 0, sArray.size()) >= 0) {
correctRec++;
} else {
incorrectRec++;
}
str = "";
}
}
infO.close();
a.print(correctRec, incorrectRec);
}
public void print(int correctRec, int incorrectRec) {
System.out.println("Out of total words " + (incorrectWords + correctWords));
System.out.println("Correct " + correctWords);
System.out.println("Incorrect " + incorrectWords);
System.out.println("Total number of recursive steps is " + (correctRec + incorrectRec));
System.out.println("The average number of comparisons for a word found = " + correctRec / correctWords);
System.out.println("The average number of comparisons for a word not found = " + incorrectRec / incorrectWords);
}
public void readDictonary(ArrayList<String> sArray) {
try {
File f = new File("dictionary.txt");
Scanner inf = new Scanner(f);
while (inf.hasNext()) {
sArray.add(inf.nextLine());
}
} catch (FileNotFoundException ex) {
System.out.println("The dictonary file was not found");
}
}
public int bSearch(ArrayList<String> sArray, String key, int lowIndex, int highIndex) {
if (lowIndex > highIndex) {
System.out.println(sArray.get(mid) + " is possibly mispelled");
incorrectWords++;
return rec * -1;
}
mid = (lowIndex + highIndex) / 2;
if (sArray.get(mid).compareToIgnoreCase(key) == 0) {
correctWords++;
return rec;
} else if (sArray.get(mid).compareToIgnoreCase(key) > 0) {
rec++;
return bSearch(sArray, key, lowIndex, mid - 1);
} else {
rec++;
return bSearch(sArray, key, mid + 1, highIndex);
}
}
}
问题可能不在你的算法中,它看起来大部分都很好,但在你的错误消息中
System.out.println(sArray.get(mid) + " is possibly mispelled");
你是说
System.out.println(key + " is possibly mispelled");
?
关于您的二进制搜索,我唯一担心的是您的 highIndex 似乎是包容性的,但是当您调用 bSearch
例程时,您传递了 sArray.size()
,这是排他性的。我怀疑如果您尝试搜索一个在词典编排上比字典中任何词都大的词,它会导致崩溃。当你调用二分查找时,你需要使用 sArray.size() - 1
作为 highIndex
。
我有一个包含单词列表的字典文件,我将该文件读入数组列表 sArray
。然后我有一本书,其中我使用字符串解析器从这本书中获取每个字符串并将其发送到二进制搜索方法 bSearch
。 bSearch
将使用递归二进制搜索来确定是否在包含字典的数组 sArray
中找到键。如果找不到该词,它将打印出该词可能拼写错误。
我的问题是,我正在获取字典数组中确实存在的单词输出。我已经确认单词被正确阅读,所以问题归结为使用 bSearch
导航 sArray
。我不确定代码有什么问题。下面列出了一些误报示例。
这是link我字典的粘贴转储;您应该能够在下面搜索这些词并找到它们。 https://paste.ee/p/wp3qh
示例输出:
结果输出仍然是所有误报
ebracteate is possibly mispelled
Phaca is possibly mispelled
holmberry is possibly mispelled
sraddha is possibly mispelled
public class Program2 {
private int mid;
public Program2() {
mid = 0;
}
public static void main(String[] args) throws FileNotFoundException, IOException {
File inf = new File("dictonary.txt");
ArrayList<String> sArray = new ArrayList<>();
Program2 a = new Program2();
a.readDictonary(sArray);
Collections.sort(sArray, String.CASE_INSENSITIVE_ORDER);
int correctRec = 0;
int incorrectRec = 0;
int correctW = 0;
int incorrectW = 0;
FileInputStream infO = new FileInputStream(new File("oliver.txt"));
char let;
String str = "";
int n = 0;
while ((n = infO.read()) != -1) {
let = (char) n;
if (Character.isLetter(let)) {
str += Character.toLowerCase(let);
}
if ((Character.isWhitespace(let) || let == '-') && !str.isEmpty()) {
// Write code to insert str in to your tree here
if (a.bSearch(sArray, str, 0, sArray.size()) >= 0) {
correctRec++;
} else {
incorrectRec++;
}
str = "";
}
}
infO.close();
a.print(correctRec, incorrectRec);
}
public void print(int correctRec, int incorrectRec) {
System.out.println("Out of total words " + (incorrectWords + correctWords));
System.out.println("Correct " + correctWords);
System.out.println("Incorrect " + incorrectWords);
System.out.println("Total number of recursive steps is " + (correctRec + incorrectRec));
System.out.println("The average number of comparisons for a word found = " + correctRec / correctWords);
System.out.println("The average number of comparisons for a word not found = " + incorrectRec / incorrectWords);
}
public void readDictonary(ArrayList<String> sArray) {
try {
File f = new File("dictionary.txt");
Scanner inf = new Scanner(f);
while (inf.hasNext()) {
sArray.add(inf.nextLine());
}
} catch (FileNotFoundException ex) {
System.out.println("The dictonary file was not found");
}
}
public int bSearch(ArrayList<String> sArray, String key, int lowIndex, int highIndex) {
if (lowIndex > highIndex) {
System.out.println(sArray.get(mid) + " is possibly mispelled");
incorrectWords++;
return rec * -1;
}
mid = (lowIndex + highIndex) / 2;
if (sArray.get(mid).compareToIgnoreCase(key) == 0) {
correctWords++;
return rec;
} else if (sArray.get(mid).compareToIgnoreCase(key) > 0) {
rec++;
return bSearch(sArray, key, lowIndex, mid - 1);
} else {
rec++;
return bSearch(sArray, key, mid + 1, highIndex);
}
}
}
问题可能不在你的算法中,它看起来大部分都很好,但在你的错误消息中
System.out.println(sArray.get(mid) + " is possibly mispelled");
你是说
System.out.println(key + " is possibly mispelled");
?
关于您的二进制搜索,我唯一担心的是您的 highIndex 似乎是包容性的,但是当您调用 bSearch
例程时,您传递了 sArray.size()
,这是排他性的。我怀疑如果您尝试搜索一个在词典编排上比字典中任何词都大的词,它会导致崩溃。当你调用二分查找时,你需要使用 sArray.size() - 1
作为 highIndex
。