如何避免在 std::unordered_map 上进行双重搜索并在实现缓存时避免在不需要时调用工厂函数

Question

我一直在实现基于 std::unordered_map 的缓存。如果值已存储，我想避免调用生成该值的工厂函数，但我也想避免运行在地图上搜索两次。

#include <unordered_map>

struct Value {
    int x;
    Value & operator=(Value const &) = delete;
};

using Cache = std::unordered_map<int, Value>;

Value make_value(int i){
    // Imagine that this takes a time ok!
    i = i + 1;
    return Value{i};
}

// This has double search
template <typename F>
Value & insert_a(Cache & cache, int key, F factory)
{
    auto i = cache.find(key);
    if(i==cache.end()){
        auto r = cache.try_emplace(key,factory(key));
        return r.first->second;
    }
    return i->second;
}

// This runs the factory even if it is not required
template <typename F>
Value & insert_b(Cache & cache, int key, F factory)
{
    auto r = cache.try_emplace(key,factory(key));
    return r.first->second;
}

int main(){
    std::unordered_map<int,Value> map;

    insert_a(map,10,make_value);

    insert_b(map,10,make_value);

    return 0;

}

我有两个 insert 的简化实现，演示了如何构建缓存。

insert_a 首先使用 find 来检测项目是否存在，并且仅当它不调用工厂来获取值时。对容器执行了两次搜索。

insert_b 调用 try_emplace 并且只是 returns 存储的值。这显然很糟糕，因为即使值已经存在，也会调用工厂。

我似乎想要一个中间地带，我将工厂函数直接传递给 try_emplace 并且仅在需要时在内部调用它。有没有办法模拟这个？

这不是关于如何构建缓存的笼统问题。我知道多线程问题、常量正确性和可变关键字。我特意问一下如何获得两者

容器的单一搜索
仅在需要时调用工厂

请注意，我特意删除了 Value class 的复制赋值运算符。一个明显的答案是先插入一个默认值，然后覆盖它。并非所有 classes 都是可复制分配的，我想支持这些。

有沙盒可以玩https://godbolt.org/z/Gja3MaGWf

Answer 1

你可以使用惰性工厂。即只在需要时调用实际工厂的一个：

#include <unordered_map>
#include <iostream>
struct Value {
    int x;
    Value & operator=(Value const &) = delete;
};

using Cache = std::unordered_map<int, Value>;

Value make_value(int i){
    // Imagine that this takes a time ok!
    i = i + 1;
    return Value{i};
}

template <typename F>
Value & insert_b(Cache & cache, int key)
{
    auto r = cache.try_emplace(key,F{key});
    return r.first->second;
}

// call the factory when needed    
struct ValueMaker {
    int value;
    operator Value() {
        std::cout << "called\n";
        return make_value(value);
    }
};

int main(){
    std::unordered_map<int,Value> map;
    insert_b<ValueMaker>(map,10);
    insert_b<ValueMaker>(map,10);
    return 0;

}

输出是

called

因为 ValueMaker::operator Value 仅在将元素插入地图时调用一次。在第二次调用时，值生成器（只是一个纤细的包装器）没有转换为 Value，因为 key 已经在映射中。

我尽量少改动您的代码。对于您的实际代码，您可能希望摆脱两个工厂（make_value 和 ValueMaker）并只使用一个。关键点是将一些轻型包装器传递给 try_emplace，它仅在转换为 Value.

时触发实际值的构造

如何避免在 std::unordered_map 上进行双重搜索并在实现缓存时避免在不需要时调用工厂函数

How to avoid double search on std::unordered_map AND avoid calling factory function when not required when implementing a cache

c++

c++17

unordered-map