是否值得在这种情况下使用 HashMap 来提高速度？

Question

我有一个调用距离函数的函数。距离函数计算两个输入字符串之间的 Levenshtein 距离算法。我正在尝试找到输入的单词（miss spelleed）和 return 的英语单词（将其用作拼写检查器）之间的最短距离，但我不确定我的 HashMap 是否让我获得了任何支持在速度上。 wordContainer 是一个包含 n 个单词的数组，这是否会使我的查找时间卡在 O(n) 中？

下面是我的代码

  private static String findClosestMatch(String word) {
        Map<Integer, String> wordAndDistanceMap = new HashMap<>();
        wordContainer.forEach(s -> wordAndDistanceMap.put(distance(s, word), s));
        return wordAndDistanceMap.get(Collections.min(wordAndDistanceMap.keySet()));
    }

Answer 1

虽然这具有合理的时间复杂度，但它有很多开销来处理您从不需要的 work/creating 对象。我建议有一个简单的循环。

private static List<String> findClosestMatch(String word) {
    int min = Integer.MAX_VALUE;
    List<String> minWords = new ArrayList<>();
    for (String s : wordContainer) {
        int dist = distance(s, word);
        if (dist < min) {
           min = dist;
           minWords.clear();
        }
        if (dist == min)
           minWords.add(s);
    }
    return minWords;
}

Answer 2

你必须计算从 word 到 N 个其他词的 Levenshtein 距离。计算N次距离是O(N).

您可以改进 O(N) 的唯一方法是，您是否可以设计一种方法来避免计算距离 O(N) 次。

HashMap 对此无能为力。你需要做的（我不知道这是否可能）是设计一种方法来避免检查 "a long way away" 与 word.

的单词的距离

Answer 3

好吧，如果您需要比这更快的方法，那么您必须使用索引机制。

我可以建议您 Apache Lucene. It is an open source and widely used framework to index data. Also, there are some developed versions as Apache SOLR and Elastic Search 基于 Lucene 内核构建。您可以在提供的链接上阅读更多内容。

在为您的静态列表编制索引，或为您根据它们计算出的值编制索引后，您可以在非常短的时间内检索到它们，这正是您目前想要的。

希望对您有所帮助。

是否值得在这种情况下使用 HashMap 来提高速度？

Is it worth using a HashMap in this instance for speed?

java

performance

big-o

dictionary

hashmap

下面是我的代码