迭代 Guava Cache 中的值会丢失数据

Iterating on values from Guava Cache loses data

我开始对在 Guava 缓存中按值查找键的方法进行基准测试,我注意到与并发级别相关的奇怪行为。我不确定这是错误还是未定义的行为,或者甚至可能是预期但未指定。

我的基准测试应该在 Guava Cache 中按值 查找 key,我知道这不是通常的事情。

这是我的完整基准 class:

@Fork(4)
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 1, time = 100, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 4, time = 100, timeUnit = TimeUnit.MILLISECONDS)
public class ValueByKey {

    private Long counter = 0L;

    private final int MAX = 2500;

    private final LoadingCache<String, Long> stringToLong = CacheBuilder.newBuilder()
        .concurrencyLevel(1)
        .maximumSize(MAX + 5)
        .build(new CacheLoader<String, Long>() {
            public Long load(String mString) {
                return generateIdByString(mString);
            }
        });

    private final Map<String, Long> mHashMap = new Hashtable<>(MAX);
    private final Map<String, Long> concurrentHashMap = new ConcurrentHashMap<>(MAX);

    @Setup(Level.Trial)
    public void setup() {
        // Populate guava cache
        for(int i = 0; i <= MAX; i++) {
            try {
                stringToLong.get(UUID.randomUUID().toString());
            } catch (ExecutionException e) {
                e.printStackTrace();
                System.exit(1);
            }
        }
    }

    @Benchmark
    public String stringToIdByIteration() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : stringToLong.asMap().entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("Returning null as value not found " + randomNum);
        return null;
    }

    @Benchmark
    public String stringToIdByIterationHashTable() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : mHashMap.entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("Returning null as value not found " + randomNum);
        return null;
    }

@Benchmark
    public String stringToIdByIterationConcurrentHashMap() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : concurrentHashMap.entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("concurrentHashMap Returning null as value not found " + randomNum);
        return null;
    }

    private Long generateIdByString(final String mString) {
        mHashMap.put(mString, counter++);
        concurrentHashMap.put(mString, counter);
        return counter;
    }

}

我注意到,当我将 .concurrencyLevel(1) 更改为不同于 1 的数字时,我开始丢失数据。以下输出来自并发级别 4:

Iteration   1: Returning null as value not found 107
Returning null as value not found 43
Returning null as value not found 20
Returning null as value not found 77
Returning null as value not found 127
Returning null as value not found 35
Returning null as value not found 83
Returning null as value not found 43
Returning null as value not found 127
Returning null as value not found 107
Returning null as value not found 83
Returning null as value not found 82
Returning null as value not found 40
Returning null as value not found 58
Returning null as value not found 127
Returning null as value not found 114
Returning null as value not found 119
Returning null as value not found 43
Returning null as value not found 114
Returning null as value not found 18
Returning null as value not found 58
66.778 us/op

我注意到在使用 HashMapHashTable 时我从未丢失任何数据,因为使用相同的代码,它的性能也更好:

Benchmark Mode Cnt Score Error Units ValueByKey.stringToIdByIteration avgt 16 58.637 ± 15.094 us/op ValueByKey.stringToIdByIterationConcurrentHashMap avgt 16 16.148 ± 2.046 us/op ValueByKey.stringToIdByIterationHashTable avgt 16 11.705 ± 1.095 us/op

是我的代码有误还是Guava无法正确处理并发级别高于1的分区HashTable?

  • The concurrency level option is used to partition the table internally such that updates can occur without contention.
  • The ideal setting would be the maximum number of threads that could potentially access the cache at one time.

没有缓存始终保证缓存命中

Presence/absence 缓存中的数据由逐出策略决定(并且数据首先加载到缓存中)。

由于您使用了 CacheBuilder.maximumSize(MAX + 5) 您的缓存将使用 基于大小的逐出 并将在 之前 开始删除元素达到预设的最大尺寸。

将并发级别设置为 4,Guava Cache 发挥它的安全性并将逐出阈值设置得低一点,这是有道理的,因为元素可以保持原样到达被驱逐。

这就是您的元素开始 'disappearing' 的原因。

要对此进行测试,请让您的 class 实现 RemovalListener 接口:

public class ValueByKey implements RemovalListener<String, Long> { 
    //...
    @Override
    public void onRemoval(RemovalNotification<String, Long> notification) {
        System.out.println("removed: " + notification.getKey() + " -> " + notification.getValue());
    }
    //...
}

...在 运行 测试期间,您会注意到匹配缺失值的驱逐:

# Warmup Iteration   1: 
removed: 110c0a73-1dc3-40ee-8909-969e6dee0ea0 -> 3
removed: 6417015a-f154-467f-b3bf-3b95831ac5b7 -> 6
removed: 5bc206f9-67ec-49a2-8471-b386ffc03988 -> 14
removed: 3c0a33e1-1fe1-4e42-b262-bf6a3e8c53f7 -> 21
Returning null as value not found 14
Returning null as value not found 14
Returning null as value not found 3
64.778 us/op
Iteration   1: 
Returning null as value not found 21
Returning null as value not found 21
Returning null as value not found 6
37.719 us/op
[...]

我可以想象驱逐的阈值计算可能很复杂,但在我的机器上将最大大小增加 5% (CacheBuilder.maximumSize(Math.round(MAX * 1.05))) 阻止了 ALL 驱逐运行 你的基准。