在 Java 中用 Chronicle-Map 替换 Roaring64NavigableMap 的正确方法是什么?

What would a proper way to replace Roaring64NavigableMap with Chronicle-Map in Java?

我有一个使用 Roaring64NavigableMap 作为 neo4j 插件的代码,使用 Neo4J API 中的 getId() 节点的 long 值。

我想使用 Chronicle-Map。我看到这个例子:

ChronicleSet<UUID> uuids =
    ChronicleSet.of(Long.class)
        .name("ids")
        .entries(1_000_000)
        .create();
  1. 如果我不知道要预测多少个值怎么办? .entries(1_000_000) 是否限制缓存或数据库最大条目数
  2. 有没有办法处理大约十亿条目的大量数据?
  3. 有没有更高效的方法来创建Chronicle-Map
  4. 我可以控制它使用的缓存大小吗?
  5. 我可以控制存储数据库的卷吗?

What if I don't know how many values to anticipate? does .entries(1_000_000) limit the cache or the DB max number of entries

来自 entries() 方法的 Javadoc:

Configures the target number of entries, that is going be inserted into the hash containers, created by this builder. If ChronicleHashBuilder.maxBloatFactor(double) is configured to 1.0 (and this is by default), this number of entries is also the maximum. If you try to insert more entries, than the configured maxBloatFactor, multiplied by the given number of entries, IllegalStateException might be thrown.

This configuration should represent the expected maximum number of entries in a stable state, maxBloatFactor - the maximum bloat up coefficient, during exceptional bursts.

To be more precise - try to configure the entries so, that the created hash container is going to serve about 99% requests being less or equal than this number of entries in size.

You shouldn't put additional margin over the actual target number of entries. This bad practice was popularized by HashMap.HashMap(int) and HashSet.HashSet(int) constructors, which accept capacity, that should be multiplied by load factor to obtain the actual maximum expected number of entries. ChronicleMap and ChronicleSet don't have a notion of load factor.

所以这是最大条目数,除非您指定 maxBloatFactor(2.0)(或 10.0 等)。目前,Chronicle Map 不支持大小写 "I really don't know how many entries will I have; maybe 1; maybe 1 billion; but I want to create a Map that will grow organically to the required size"。 This is a known limitation.

Is there a way to handle really big amount of data around a billion entries?

是的,如果你有足够的内存。虽然 memory-mapped,但 Chronicle Map 的设计并不是为了在数据量明显大于内存时高效工作。在这种情况下使用 LMDB、RocksDB 或类似的东西。