检查两个字符串是否是 Java 中的排列（Hashmap vs Array 的效率）

Question

我正在阅读一本编码书籍，试图了解更多有关 Java 的信息，但遇到了这个问题。

问题问："Given two strings, write a method to decide if one is a permutation of the other."

考虑了大约一分钟后，我决定采用 Hashmap 解决方案。我的逻辑是添加、删除和搜索都是 O(1)，因此这是一个快速的解决方案。我的代码如下：

    public static boolean isPermutation(String a, String b) {
    if(a.length() != b.length()) {
        return false;
    }
    HashMap<Character, Integer> map = new HashMap<Character, Integer>();
    for(int x = 0; x < a.length(); x++) {
        char letter = a.charAt(x);
        if(!(map.containsKey(letter))) {
            map.put(letter, 1);
        }
        else {
            int val = map.get(letter) + 1;
            map.put(letter, val);
        }
    }
    for(int y = 0; y < b.length(); y++) {
        char letter = b.charAt(y);
        if(!(map.containsKey(letter))) {
            return false;
        }
        else {
            int val = map.remove(letter) - 1;
            if(val > 0) {
                map.put(letter, val);
            }
        }
    }
    return true;
}

然而，这本书使用数组作为答案。

public boolean permutation(String s, String t) {
    if (s.length() != t.length()) {
        return false;
    }
    int[] letters = new int[256];
    char[] s_array = s.toCharArray();
    for (char c : s_array) {
        letters[c]++;
    }
    for (int i = 0; i < t.length(); i++) {
        int c = (int) t.charAt(i);
        if (--letters[c] < e) {
            return false;
        }
    }
    return true;
}

我有三个问题。

首先，我想知道我的实现是否比书中的效率低——如果是，低效率是什么，是否可以纠正它们以便 Hashmap 实现更好（或至少相等） ) 到给定的数组实现。

其次，我了解到我的 Hashmap 使用自动装箱将 Character 转换为 char。自动装箱会带来明显的减速吗？

第三，在我的代码中，我试图避免使用 Hashmap 的 remove() 函数。我的逻辑是，虽然从理论上讲删除应该是 O(1)，但使用 put() 将现有密钥替换为新密钥（在这种情况下，覆盖旧值）会更有效，因为替换将是比删除然后添加成本更低。我的逻辑正确吗？这是我应该做的事吗？

非常感谢！

Answer 1

首先观察：Big Oh 符号不是性能的衡量标准。相反，它表明算法将如何随着变量（例如 N）趋于无穷大而扩展。

First, I'd like to know whether my implementation is less efficient than the book's ...

对他们进行基准测试！说真的，仅仅通过检查代码就很难说哪种方法会更快。

您的基准测试需要考虑到相对性能将取决于输入的事实；例如使用一系列不同的字符串长度进行测量。

... and if so, what the inefficiencies are ...

这就是分析的目的。它会告诉您每种方法花费了多少时间。而一些profiler可以测量到行号的水平：

Java line-by-line Method/Function Profiling - Profiler &/or Eclipse Plugin

... and whether they can be corrected so that the Hashmap implementation is better (or at least equal) to the given array implementation.

这是为了让您弄清楚...一旦您进行了基准测试和概要分析。

Second, I understand that my Hashmap uses autoboxing to convert from Character to char. Is there a significant slowdown that comes with autoboxing?

肯定会放缓。它应该是可测量的（或可估计的）。例如，如果您分析您的版本，您可以看到在 Character 方法中花费了多少时间。

会很重要吗？很难预测！

And third, in my code I tried to avoid using the Hashmap's remove() function. ... Is my logic correct?

是的。但是您可以再次通过基准测试和/或分析来验证这一点。

综上所述，我>>有根据的猜测<<是您的解决方案可能更适用于小字符串，而书中的解决方案肯定更适用于足够长的字符串。它归结为两件事：

书中的解决方案预先分配了一个 256 ints ... 1024 字节的数组。这可能比分配一个空 HashMap.
自动装箱、地图查找和地图插入或更新的每个字符成本可能明显高于书籍解决方案中的等效成本。

检查两个字符串是否是 Java 中的排列（Hashmap vs Array 的效率）

Checking whether two strings are permutations in Java (Efficiency of Hashmap vs Array)

java

big-o

hashmap