Java 带有长度比较器错误的 TreeSet?
Java TreeSet with length Comparator bug?
我有下面的代码,它使用基于字符串长度的比较器创建了一个 TreeSet。
public class TreeSetComparator {
public static void main(String[] args) {
SortedSet<String> sortedSet = new TreeSet<>(Comparator.comparing(String::length));
sortedSet.addAll(Arrays.asList("aa", "bb", "aa"));
System.out.println(sortedSet);
}
}
令我惊讶的是,上面的输出是
[aa]
虽然我希望
[aa, bb]
或
[bb, aa]
"bb"部分消失了,这似乎违背了SortedSet的约定。比较器应该只对元素进行排序,而不确定它们的唯一性,通常由equals确定。
另一方面,如果我将比较器增强为始终 return 对于不相等的项目非零,如下所示,只有这样我才能得到正确的结果。
SortedSet<String> sortedSet = new TreeSet<>(Comparator.comparing(String::length).reversed().thenComparing(String::toString));
sortedSet.addAll(Arrays.asList("aa", "bb", "aa"));
System.out.println(sortedSet);
现在的输出是 [aa, bb]
,如我所料。
以上是 TreeSet 实现中的错误吗?
我的环境如下:
mvn --version 21:40:22
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T19:33:14+01:00)
Maven home: /home/aaaa/.sdkman/candidates/maven/current
Java version: 10.0.2, vendor: Oracle Corporation, runtime: /usr/lib/jvm/java-10-jdk
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "4.14.60-1-manjaro", arch: "amd64", family: "unix"
更新
这是一个相关的 post 以及有关如何在 Java 的未来版本中解决该问题的建议:https://yesday.github.io/blog/2018/java-gotchas-sorted-set-ignores-the-equals-method.html
这不是错误。至少在 TreeSet
.
中没有
来自 javadoc,我强调:
Note that the ordering maintained by a set (whether or not an explicit
comparator is provided) must be consistent with equals if it is to
correctly implement the Set interface. (See Comparable or Comparator
for a precise definition of consistent with equals.) This is so
because the Set interface is defined in terms of the equals operation,
but a TreeSet instance performs all element comparisons using its
compareTo (or compare) method, so two elements that are deemed equal
by this method are, from the standpoint of the set, equal. The
behavior of a set is well-defined even if its ordering is inconsistent
with equals; it just fails to obey the general contract of the Set
interface.
所以因为 "aa" 和 "bb" 的长度都是 2,所以 compareTo
认为它们相等,因此 TreeSet
.
By definition,与equals一致表示:
The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.
看起来他们假设比较器使用与 equals 方法相同的 equals 定义。来自 SortedSet API:
Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface. (See the Comparable interface or Comparator interface for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
我有下面的代码,它使用基于字符串长度的比较器创建了一个 TreeSet。
public class TreeSetComparator {
public static void main(String[] args) {
SortedSet<String> sortedSet = new TreeSet<>(Comparator.comparing(String::length));
sortedSet.addAll(Arrays.asList("aa", "bb", "aa"));
System.out.println(sortedSet);
}
}
令我惊讶的是,上面的输出是
[aa]
虽然我希望
[aa, bb]
或
[bb, aa]
"bb"部分消失了,这似乎违背了SortedSet的约定。比较器应该只对元素进行排序,而不确定它们的唯一性,通常由equals确定。
另一方面,如果我将比较器增强为始终 return 对于不相等的项目非零,如下所示,只有这样我才能得到正确的结果。
SortedSet<String> sortedSet = new TreeSet<>(Comparator.comparing(String::length).reversed().thenComparing(String::toString));
sortedSet.addAll(Arrays.asList("aa", "bb", "aa"));
System.out.println(sortedSet);
现在的输出是 [aa, bb]
,如我所料。
以上是 TreeSet 实现中的错误吗?
我的环境如下:
mvn --version 21:40:22
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T19:33:14+01:00)
Maven home: /home/aaaa/.sdkman/candidates/maven/current
Java version: 10.0.2, vendor: Oracle Corporation, runtime: /usr/lib/jvm/java-10-jdk
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "4.14.60-1-manjaro", arch: "amd64", family: "unix"
更新
这是一个相关的 post 以及有关如何在 Java 的未来版本中解决该问题的建议:https://yesday.github.io/blog/2018/java-gotchas-sorted-set-ignores-the-equals-method.html
这不是错误。至少在 TreeSet
.
来自 javadoc,我强调:
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
所以因为 "aa" 和 "bb" 的长度都是 2,所以 compareTo
认为它们相等,因此 TreeSet
.
By definition,与equals一致表示:
The ordering imposed by a comparator c on a set of elements S is said to be consistent with equals if and only if c.compare(e1, e2)==0 has the same boolean value as e1.equals(e2) for every e1 and e2 in S.
看起来他们假设比较器使用与 equals 方法相同的 equals 定义。来自 SortedSet API:
Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the Set interface. (See the Comparable interface or Comparator interface for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a sorted set performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.