我可以使用 identityHashCode 在对象之间根据相同性生成 compareTo 吗？

Question

我想在两个对象之间实现一个简单的比较器，其唯一要求是

它是一个有效的比较器（即定义所有对象的线性顺序）并且
.compare 将 return 0 当且仅当对象相同.

Comparator.comparing(System::identityHashCode) 行得通吗？还有别的办法吗？

动机： 我想构建一个集合，允许我将带时间戳的消息存储在线程安全的集合中，该集合将支持诸如“获取时间戳位于 [a,b) 的所有消息”之类的查询。

似乎 Guava 的 TreeMultimap 使用全局锁（编辑：如果用 synchronizedSortedSetMultimap 包装器包装），并且 ConcurrentSkipListMap 似乎每次只支持一个条目（它是一张地图，而不是多张地图）。所以我想到只使用一组对：

ConcurrentSkipListSet<ImmutablePair<Float,Message>> db,

这些对是按词汇顺序排列的，首先是时间（使用 Float.compareTo），然后是 Comparator.nullsFirst(Comparator.comparing(System::identityHashCode)).

nullsFirst就在那里db.subSet(ImmutablePair.of(a,null), ImmutablePair.of(b,null))查询半开时间间隔[a,b].
你明白我为什么关心比较器保持相同性了：如果消息比较器return对于不同的消息为零，消息可能会被删除。
你也明白了为什么我不需要比较器的其他东西：它就在那里，所以我可以使用 ConcurrentSkipListSet 的存储机制。我当然不想强加给用户（好吧，只有我 :-) 来实现 Message.
的比较器
另一种可能的解决方案是使用 ConcurrentSkipListMap<Float, Set<Message>>（具有线程安全的 Set<> 实例）但在内存方面似乎有点浪费，我需要删除 emptySet 的我自己在删除消息后节省内存。

编辑：正如一些人指出的那样，identityHashCode 可能会产生冲突，事实上我现在已经确认我的设置中存在此类冲突（这大致相当于拥有 4K如上所述的集合，每个集合在每个时间段填充 4K 消息）。这很可能是我看到一些消息丢失的原因。所以我现在比以往任何时候都更有兴趣找到某种方法来拥有一个“不可知论”的比较运算符，真正尊重相同性。实际上，一个 64 位哈希值（而不是 identityHashCode 提供的 32 位值）可能就足够了。

Answer 1

虽然它不是 gua运行teed，但我怀疑这导致问题的可能性微乎其微。

System.identityHashCode return如果不被覆盖，Object.hashCode 将 return 的值，包括此 in the documentation:

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.

那么“在合理可行的范围内”就足够了吗？虽然它不是 gua运行teed，但如果您运行遇到它导致问题的情况，我会感到非常惊讶。您必须有两条具有完全相同时间戳和的消息，其中 JVM 的 Object.hashCode 实现 return 两条消息的值相同。

如果那个巧合的结果是“核电站爆炸”，那我就不会冒险了。如果这种巧合的结果是“我们未能向客户收费”——甚至“我们向客户收费两次，并可能被起诉”，如果没有更好的替代建议，我可能会接受这个机会。

Answer 2

正如@StuartMarks 在评论中指出的那样，Guava 支持 Ordering.arbitrary()，提供线程安全的碰撞处理。该实现有效地利用了 identityHashCode:

@Override
public int compare(Object left, Object right) {
  if (left == right) {
    return 0;
  } else if (left == null) {
    return -1;
  } else if (right == null) {
    return 1;
  }
  int leftCode = identityHashCode(left);
  int rightCode = identityHashCode(right);
  if (leftCode != rightCode) {
    return leftCode < rightCode ? -1 : 1;
  }

  // identityHashCode collision (rare, but not as rare as you'd think)
  int result = getUid(left).compareTo(getUid(right));
  if (result == 0) {
    throw new AssertionError(); // extremely, extremely unlikely.
  }
  return result;
}

因此只有在发生散列冲突时，才会调用 getUid（使用记忆化的 AtomicInteger 计数器来分配 uid）。

在“一个”行中编写所需的带时间戳的消息容器也很容易（也许不太容易阅读？）：

db = new ConcurrentSkipListSet<>(
                (Ordering.<Float>natural().<ImmutablePair<Float,Message>>onResultOf(x -> x.left))
                        .compound(Ordering.arbitrary().nullsFirst().<ImmutablePair<Float,Message>>onResultOf(x -> x.right)))

Answer 3

Will Comparator.comparing(System::identityHashCode) work? Is there another way?

如前所述，identityHashCode 不是唯一的。

Actually, a 64 bit hash value (instead of the 32bit value provided by identityHashCode) would probably suffice

我认为这只会减少重叠的可能性，而不是消除它们。哈希算法旨在限制重叠，但通常无法保证 none。比如MD5是128位的，还是有重叠的。

使用 AtomicLong 为每条消息分配一个唯一编号怎么样？那么你的比较函数会做：

按时间比较。如果可能的话，我会使用 long 而不是 float。
如果相同时间则按唯一值进行比较。

如果您有多个系统接收这些消息，那么您将需要记录唯一的系统 ID 和消息编号以确保唯一性。

我可以使用 identityHashCode 在对象之间根据相同性生成 compareTo 吗？

Can I use identityHashCode to produce a compareTo between Objects respecting same-ness?

java

java.util.concurrent

comparator

skip-lists

concurrentskiplistmap