TreeSet 给出错误的输出 - Java8

TreeSet giving incorrect output - Java8

在处理树集时,我发现了非常奇怪的行为。

根据我的理解,以下程序应该打印两行相同的内容:

public class TestSet {
    static void test(String... args) {
        Set<String> s = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
        s.addAll(Arrays.asList("a", "b"));
        s.removeAll(Arrays.asList(args));
        System.out.println(s);
    }

    public static void main(String[] args) {
        test("A");
        test("A", "C");
    }
}

但奇怪的是它打印:

[b]
[a, b] 

我无法理解 - 为什么树集会这样?

好吧,这让我很惊讶,我不知道我是否正确,但是看看 AbstractSet 中的这个实现:

public boolean removeAll(Collection<?> c) {
  Objects.requireNonNull(c);
  boolean modified = false;

  if (size() > c.size()) {
    for (Iterator<?> i = c.iterator(); i.hasNext(); )
      modified |= remove(i.next());
  } else {
    for (Iterator<?> i = iterator(); i.hasNext(); ) {
      if (c.contains(i.next())) {
        i.remove();
        modified = true;
      }
    }
  }
  return modified;
}

基本上在您的示例中,set 的大小等于您要删除的参数的大小,因此将调用 else 条件。在这种情况下,会检查您的参数集合是否删除 contains 迭代器的当前元素,并且该检查区分大小写,因此它会检查 c.contains("a") 和 returns 是否为假,因为 c 包含 "A",而不是 "a",所以不会删除该元素。请注意,当您将元素添加到集合 s.addAll(Arrays.asList("a", "b", "d")); 时,它可以正常工作,因为 size() > c.size() 现在为真,因此不再需要 contains 检查。

这是因为一个SortedSet的Comparator用于排序,但是removeAll依赖于每个元素的equals方法。来自 SortedSet documentation:

Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface. (See the Comparable interface or Comparator interface for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.

Comparable documentation:

中定义了“consistent with equals”的解释

The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C. Note that null is not an instance of any class, and e.compareTo(null) should throw a NullPointerException even though e.equals(null) returns false.

It is strongly recommended (though not required) that natural orderings be consistent with equals. This is so because sorted sets (and sorted maps) without explicit comparators behave "strangely" when they are used with elements (or keys) whose natural ordering is inconsistent with equals. In particular, such a sorted set (or sorted map) violates the general contract for set (or map), which is defined in terms of the equals method.

总而言之,您的 Set 的比较器的行为与元素的 equals 方法不同,导致异常(尽管可预测)行为。

这很有趣,所以这里有一些输出测试:

static void test(String... args) {
    Set<String> s =new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
    s.addAll(Arrays.asList( "a","b","c"));
    s.removeAll(Arrays.asList(args));
    System.out.println(s);
}

public static void main(String[] args) {
    test("C");          output: [a, b]
    test("C", "A");     output: [b]
    test("C", "A","B"); output: [a, b, c]
    test("B","C","A");  output: [a, b, c]
    test("K","C");      output: [a, b]
    test("C","K","M");  output: [a, b, c] !!
    test("C","K","A");  output: [a, b, c] !!
}

现在没有比较器,它就像排序的一样工作 HashSet<String>():

    static void test(String... args) {
    Set<String> s = new TreeSet<String>();//
    s.addAll(Arrays.asList( "a","b","c"));
    s.removeAll(Arrays.asList(args));
    System.out.println(s);
}

public static void main(String[] args) {
    test("c");          output: [a, b]
    test("c", "a");     output: [b]
    test("c", "a","b"); output: []
    test("b","c","a");  output: []
    test("k","c");      output: [a, b]
    test("c","k","m");  output: [a, b]
    test("c","k","m");  output: [a, b]
}

现在来自文档:

public boolean removeAll(Collection c)

Removes from this set all of its elements that are contained in the specified collection (optional operation). If the specified collection is also a set, this operation effectively modifies this set so that its value is the asymmetric set difference of the two sets.

This implementation determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method.

Source

添加一些信息,说明为什么 TreeSetremove 实际上会在您的示例中不区分大小写地删除(并且前提是您遵循 if (size() > c.size()) 路径,如答案中所述@Shadov):

这是TreeSet中的remove方法:

public boolean remove(Object o) {
        return m.remove(o)==PRESENT;
    }

它从其内部调用 remove TreeMap :

public V remove(Object key) {
    Entry<K,V> p = getEntry(key);
    if (p == null)
        return null;

    V oldValue = p.value;
    deleteEntry(p);
    return oldValue;
}

调用 getEntry

 final Entry<K,V> getEntry(Object key) {
        // Offload comparator-based version for sake of performance
        if (comparator != null)
            return getEntryUsingComparator(key);
        if (key == null)
            throw new NullPointerException();
        @SuppressWarnings("unchecked")
            Comparable<? super K> k = (Comparable<? super K>) key;
        Entry<K,V> p = root;
        while (p != null) {
            int cmp = k.compareTo(p.key);
            if (cmp < 0)
                p = p.left;
            else if (cmp > 0)
                p = p.right;
            else
                return p;
        }
        return null;
    }

如果有一个Comparator(如你的例子),条目是根据这个Comparator搜索的(这是由getEntryUsingComparator完成的),这就是为什么它实际上是找到(然后删除),尽管大小写不同。