Java:对象池和哈希集

Java: Object pooling and hash sets

让我们假设以下 class...

class Foo {

  private Bar1 bar1;
  private Bar2 bar2;

  // many other fields

  @Override
  public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;
    Foo foo = (Foo) o;
    if (!bar1.equals(foo.getBar1()) return false;
    if (!bar2.equals(foo.getBar2()) return false;
    // etc...
  }

  @Override
  public int hashCode() {
    int result = bar1.hashCode();
    result = 31 * result + bar2.hashCode();
    // etc...
  }

  // setters & getters follow...
}

每分钟有数千个 Foo 实例被创建、处理并随后在池中回收。工作流程如下:

Set<Foo> foos = new THashSet<>();
while (there-is-data) {

  String serializedDataFromApi = api.getData();
  Set<Foo> buffer = pool.deserializeAndCreate(serializedDataFromApi);
  foos.addAll(buffer);
}

processor.process(foos);
pool.recycle(foos);

问题在于不同缓冲区之间可能存在重复的 foo 对象(具有相同的值)。它们被具体化为 Foo 的不同实例,但是在调用 foos.addAll(buffer) 时它们被认为是相等的。

我的问题是:

What happened with those "duplicate" instances? Are they "lost" and garbage collected?

是的,在 while (there-is-data) 的当前迭代完成后,这些将立即符合 GC 条件

If I wanted to keep those instances available in pool, what would be the most effective way to test for duplicates before inserting using addAll and recycling instances?

Set.add returns true 如果元素被插入,false 如果它是重复的。所以你可以用

替换 addAll
for (Foo f : buffer) {
  if (!foos.add(f)) {
    // handle duplicate
  }
}

不会影响性能,因为 addAll 执行相同的操作 - 逐一迭代和添加。