当 return 类型可移动时,有没有一种方法可以一致地引用捕获 returned 变量并在析构函数中使用它?

Is there a way to consistently reference-capture a returned variable and use it in a destructor when the return type is movable?

我一直致力于将透明的结果缓存功能添加到一些具有多个 return 语句的计算密集型代码中。每当函数 returns 出于任何原因我想获取工作缓冲区的最后一个已知值(下面代码中的 val )并将其放入缓存中所以我不需要计算它以后再来。这是我正在尝试的基本再现:

#include <iostream>
#include <ranges>
#include <algorithm>
#include <map>
#include <string>
#include <functional>

struct ScopeExit
{
    ScopeExit(std::function<void()> f) : m_f(f) {}
    ~ScopeExit() {m_f();}

private:
    std::function<void()> m_f;
};

std::map<int, std::string> g_cache;

std::string func(const int key)
{
    std::string val;

    if (g_cache.contains(key))
    {
        // returning from the cache like this does NOT work
        return g_cache.at(key);

        // assigning the cached value to 'val' prior to returning DOES work
        //val = g_cache.at(key);
        //return val;
    }

    val = "hello world";

    // add the result to the cache just before we give the result to the caller
    ScopeExit scopeExit1([key, &val](){
        auto [iter, inserted] = g_cache.try_emplace(key, val);
        if (inserted)
            std::cout << "inserted \"" << iter->second << "\" into cache for key = " << iter->first << std::endl;
    });

    // normally a bunch of math happens here but I'm using a string for this example

    std::ranges::fill(val, 'b');

    if (2 == key)
        return val;
    
    std::ranges::for_each(val, [](auto& v){v += 1;});

    return val;
}

int main()
{
    auto a = func(1);
    auto b = func(2);

    // these should pull from cache
    auto c = func(1);
    auto d = func(2);

    std::cout << "key = 1: " << a << std::endl;
    std::cout << "key = 2: " << b << std::endl;
    std::cout << "key = 1 (cached): " << c << std::endl;
    std::cout << "key = 2 (cached): " << d << std::endl;

    return 0;
}

https://godbolt.org/z/Ma314evhb

我注意到当它直接从缓存 return 中读取时(例如 return g_cache.at(key);),所有缓存的值都是空的,就好像它们是从 [=17= 之前移动的一样] 析构函数代码 运行.

inserted "" into cache for key = 1
inserted "" into cache for key = 2
key = 1: ccccccccccc
key = 2: bbbbbbbbbbb
key = 1 (cached): 
key = 2 (cached): 

进一步的研究使我找到了 https://en.cppreference.com/w/cpp/language/return 的“从局部变量和参数自动移动”部分,我相信它指出了(解释 - 如果我我错了):

if expression is a non-volatile object type declared in the body, it will treat the return value as an rvalue expression and pick the move constructor if available, otherwise it will treat expression as an lvalue and pick the copy constructor.

我在这里写了同样的东西的检测版本:https://godbolt.org/z/6xrdTddsM,它确认移动构造函数正在 return 上被调用并且之后被访问。

(0x7ffe1bf1f330) default constructor
(0x7ffe1bf1f410) move constructor (source = 0x7ffe1bf1f330)
(0x210dee8) copy constructor (source = 0x7ffe1bf1f330)
inserted "" into cache for key = 1
(0x7ffe1bf1f330) destructor

(0x7ffe1bf1f330) default constructor
(0x7ffe1bf1f3f0) move constructor (source = 0x7ffe1bf1f330)
(0x210df38) copy constructor (source = 0x7ffe1bf1f330)
inserted "" into cache for key = 2
(0x7ffe1bf1f330) destructor

(0x7ffe1bf1f330) default constructor
(0x7ffe1bf1f3d0) copy constructor (source = 0x210dee8)
(0x7ffe1bf1f330) destructor

(0x7ffe1bf1f330) default constructor
(0x7ffe1bf1f3b0) copy constructor (source = 0x210df38)
(0x7ffe1bf1f330) destructor

key = 1: ccccccccccc
key = 2: bbbbbbbbbbb
key = 1 (cached): 
key = 2 (cached): 
(0x7ffe1bf1f3b0) destructor
(0x7ffe1bf1f3d0) destructor
(0x7ffe1bf1f3f0) destructor
(0x7ffe1bf1f410) destructor
(0x210df38) destructor
(0x210dee8) destructor

有趣的是,如果我将 return g_cache.at(key); 替换为 val = g_cache.at(key); return val;,它会省略移动并在调用者的堆栈中就地构造字符串并按我想要的方式工作。然而,因为我不认为这种省略是强制性的(我不是 returning prvalues)我认为它只是巧合地起作用,尽管未定义的行为消毒剂似乎并不介意。

附带问题:这实际上是一致的/有保证的/定义明确的行为吗?如果是这样,从可维护性的角度来看,有什么办法可以让它不那么脆弱吗? (例如,我添加了一个不同的 return 变量,哎呀,我在不知不觉中完全破坏了缓存)

(0x7ffee22bc150) default constructor
(0x1c1dee8) copy constructor (source = 0x7ffee22bc150)
inserted "ccccccccccc" into cache for key = 1

(0x7ffee22bc130) default constructor
(0x1c1df38) copy constructor (source = 0x7ffee22bc130)
inserted "bbbbbbbbbbb" into cache for key = 2

(0x7ffee22bc110) default constructor
(0x7ffee22bc110) copy assign (source = 0x1c1dee8)

(0x7ffee22bc0f0) default constructor
(0x7ffee22bc0f0) copy assign (source = 0x1c1df38)

key = 1: ccccccccccc
key = 2: bbbbbbbbbbb
key = 1 (cached): ccccccccccc
key = 2 (cached): bbbbbbbbbbb
(0x7ffee22bc0f0) destructor
(0x7ffee22bc110) destructor
(0x7ffee22bc130) destructor
(0x7ffee22bc150) destructor
(0x1c1df38) destructor
(0x1c1dee8) destructor

我还注意到,如果我注释掉移动构造函数和移动赋值运算符,它会按照我想要的方式工作——这让它可以根据上面引用的 return 规则回退到复制构造函数。这种方法似乎是实现我想要的一致且正确的方法,但在我使用 std::vector<float>.

的实际应用程序中并不是很实用

有什么方法可以确保我的 returned 变量不会被移动,以便我可以在我的作用域退出析构函数中使用它?

Is there any way to ensure that my returned variable will not be moved from so that I can use it in my scope exit destructor?

是:尝试 return 时复制一份。将你所有的 return a; 变成 return std::string(a);whatever type you're using

一个更简单的解决方案是正确分离您的关注点。您需要一个计算值的函数,并且需要一个处理缓存的函数。您的问题来自于尝试在同一个函数中执行这两项操作。

所以...不要那样做:

std::string uncached_func(const int key)
{
    std::string val = "hello world";

    // add the result to the cache just before we give the result to the caller
    ScopeExit scopeExit1([key, &val](){
        auto [iter, inserted] = g_cache.try_emplace(key, val);
        if (inserted)
            std::cout << "inserted \"" << iter->second << "\" into cache for key = " << iter->first << std::endl;
    });

    // normally a bunch of math happens here but I'm using a string for this example

    std::ranges::fill(val, 'b');

    if (2 == key)
        return val;
    
    std::ranges::for_each(val, [](auto& v){v += 1;});

    return val;
}

std::string cached_func(const int key)
{
    //If you use `contains` followed by `at`, you're doing it wrong.
    if (auto it = g_cache.find(key); it != g_cache.end())
    {
        return it->second;
    }

    auto ret = uncached_func(key);
    //No point in `try_emplace`, because it obviously wasn't cached if we're here.
    g_cache.emplace(key, ret); 
    return ret;
}

的确,你可以移动cached_funcinto a generic type that also stores its own cache。这使您可以避免依赖全局变量,当有人最终尝试线程此代码时, 咬住您:

template<typename> class CachedAccess;

template<typename Ret, typename Arg>
class CachedAccess<Ret(Arg)>
{
private:
  std::map<Arg, Ret> cache_;
  std::function<Ret(Arg)> callable_;

public:
  explicit CachedAccess(std::function<Ret(Arg)> callable) : callable_(callable)
  {}

  Ret operator()(Arg arg)
  {
    if (auto it = g_cache.find(arg); it != g_cache.end())
    {
        return it->second;
    }

    auto ret = callable_(arg);
    g_cache.emplace(arg, ret); 
    return ret;
  }
};

我假设因为您正在缓存性能对您很重要的值,所以在这种情况下不断复制 std::string 可能不是您想要做的。

例如,这里的这一行:return g_cache.at(key); 将始终复制构造 std::string(假设函数的 return 类型是按值)。

我还发现使用 ScopeExit 会给您的情况增加不必要的复杂性,并会迫使您按值捕获 std::string 或显式复制构造一个新的 std::string 时returning.

此外,您最好不要使用全局缓存,因为这会让您在处理多线程代码时遇到麻烦。

话虽这么说,但为了与您当前的设计保持一致(具有全局缓存并缓存 func 函数中的值),这是一种更有效的方式来完成您所追求的。

static std::unordered_map<int, std::string> cache{ };

static const std::string& get_value( int key ) 
{
    if ( auto it{ cache.find( key ) }; it != std::end( cache ) ) 
    {
        return it->second;
    }
    
    std::string value{ "Some complex computed value" };
    const auto[ it, _ ]{ cache.insert( { key, std::move( value ) } ) };
    return it->second;
}