在 Java 中使用 hashmap 进行单词列表搜索

Question

我有一个单词列表，我的单词列表中有超过 50,000 个单词。如您所见，我读了我的话并将它们添加到一个数组列表中，但是在这个过程之后，当我想读我的话时，它发生得非常慢。这就是我想到 Hashmap 的原因。我想阅读我的文字，当我收到用户输入的文字时，我想检查它是否在 HashMap 中。即使我做了研究，我也找不到确切的方法。我该怎么做？

  public ArrayList<String> wordReader () throws FileNotFoundException {
        File txt = new File(path);
        Scanner scanner = new Scanner(txt);
        ArrayList <String> words = new ArrayList<String>();
        while (scanner.hasNextLine()) {
            String data = scanner.nextLine();
            words.add(data);
        }
        scanner.close();
        return words;
    }

Answer 1

我会使用 Set，而不是 List，因为当您将重复项添加到集合中时，集合会自动忽略重复项。如果它不存在，它 returns 为真并添加它，否则为假。

public Set<String> wordReader () throws FileNotFoundException {
        File txt = new File(path);
        Scanner scanner = new Scanner(txt);
        Set <String> words = new HashSet<>();
        while (scanner.hasNextLine()) {
            String data = scanner.nextLine();
            if(!words.add(data)) {
               // present - Do something
            } 
         }   
        
        scanner.close();
        return words;
}

因为集合没有排序，所以它们不是随机访问集合。因此，您可以将集合添加到列表中，如下所示：

Set<String> words = wordReader();
List<String> wordList = new ArrayList<>(words);

现在您可以使用索引检索它们。

您可能希望通过将文件名作为参数传递来使您的方法更加通用。

Answer 2

由于您将检查输入的单词是否出现在从文件读取的单词列表中，因此您可以使用 HashSet<String> 而不是 ArrayList<String>。

您的方法将变成

public HashSet<String> wordReader () throws FileNotFoundException {
        File txt = new File(path);
        Scanner scanner = new Scanner(txt);
        HashSet <String> words = new HashSet<String>();
        while (scanner.hasNextLine()) {
            String data = scanner.nextLine();
            words.add(data);
        }
        scanner.close();
        return words;
    }

现在，在您阅读输入的单词后，您可以检查它是否出现在 HashSet 中。这将是一个更快的操作，因为查找将花费恒定的时间。

public boolean isWordPresent(String word, HashMap<String> words){
    return words.contains(word);
}

附带说明一下，HashSet 在内部使用 HashMap 来执行操作。

Answer 3

如果我没有正确理解你的问题，当你试图检查列表中是否存在特定单词时，你在遍历充满 50.000 个单词的 ArrayList 时遇到了性能问题。

这是因为在未排序的 List 中查找元素具有 O(n) 的复杂性。您可以通过使用像 BST（二叉搜索树）这样的排序数据结构来提高性能，这将改进具有 O(log n) 复杂度的研究操作。

此外，您使用 Map 的想法绝对可行，因为 HashMap 允许在 O(1)[=43= 之间添加和获取操作的复杂性]（对于理论上完美的哈希算法，密钥之间根本没有冲突）和 O(n)（对于碰撞可能性很高的糟糕哈希算法）。另外，从Java 8开始，在HashMap的实现中引入了一个优化，在多个元素加入同一个桶的高碰撞条件下，一个桶对应的数据结构实际上实现为：平衡树而不是列表，在最坏的情况下授予 O(log n) 复杂性。

https://www.logicbig.com/tutorials/core-java-tutorial/java-collections/java-map-cheatsheet.html

但是，使用 HashMap 作为我假设的字典（只有不同的词）可能是不必要的，因为您会使用一个词作为键和值。正如其他人指出的那样，您可以使用 Set，或者更好的 HashSet，而不是 HashMap。事实上，HashSet 是通过引擎盖下的 HashMap 实例实现的，这将为我们提供前面讨论的所有性能和优势（这就是我写那篇序言的原因）。

https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/HashSet.html

您的实现可能如下所示：

public Set<String> wordReader(String path) throws FileNotFoundException { File txt = new File(path); Scanner scanner = new Scanner(txt); Set<String> words = new HashSet<>(); while (scanner.hasNextLine()) { String data = scanner.nextLine(); words.add(data); } scanner.close(); return words; } public boolean isWordContained(Set<String> set, String word) { return set.contains(word); }

在 Java 中使用 hashmap 进行单词列表搜索

Word list search with hashmap in Java

java

algorithm

search

hashmap