为 boost::deque 保留块分配器?

Preserving block-allocator for boost::deque?

我想让 boost::container::deque 重用释放的块,而不是先释放它们,然后再分配新的块。

boost::container::deque 允许的唯一配置选项是 compile-time specification of the block size(根据项目数或字节数)。

确实,指定块大小是我想要使用的东西,但我也想指定在变得空闲后将保留的块数,并在需要新块时重新使用。但是,如图所示 here,对于 boost::container::deque,此数字为 0,因此一旦块变为空闲,它就会释放!我想创建一个等于 1.

的双端队列

我看到了通过指定自定义分配器实现此目的的机会。想想这个丑陋的:

template < typename Block >
struct PreservingAllocator : std::allocator<Block>
{
  using std::allocator<Block>::allocator;

  Block* allocate(size_t nn)
  {
    if (nn == 1) if (auto oldBlock = m_reserve->exchange(nullptr); !!oldBlock) return oldBlock;
    return std::allocator<Block>::allocate(nn);
  }

  void deallocate(Block* block, size_t nn)
  {
    if (nn == 1) block = m_reserve->exchange(block);
    if (!!block) std::allocator<Block>::deallocate(block, nn);
  }

private:
  static constexpr auto Deleter = [](std::atomic<Block*>* pointer)
  {
    if (!!pointer) if (auto block = pointer->exchange(nullptr); !!block)
      std::allocator<Block>{}.deallocate(block,1);
   delete pointer;
  };

  std::shared_ptr<std::atomic<Block*>> m_reserve = {new std::atomic<Block*>{nullptr},Deleter};
};

所以问题是。

  1. 如何为 boost::container::dequeblocks,而不是 items!)指定块分配器?
  2. 如果有办法,那么这种规范是否支持带状态的分配器?
  3. 如果是,那么上面提到的分配器会用吗?
  4. 毕竟,如果不是以这种方式,我如何才能使双端队列至少不会释放其中一个已释放的块并在以后需要新块时重用它?

这给了我更多使用分配器的借口。我选择了多态分配器——尽管那只是切向相关的¹。

Aside: The relation is that with custom allocators you often want to propagate the allocator to nested types that are allocator-aware. See "Advanced" below

示例元素类型

struct X {
    std::string key, value;
};

它并没有变得更简单,尽管它允许我们尝试共享 稍后使用嵌套字符串的分配器。

跟踪内存资源

让我们创建一个跟踪内存资源。这非常简单,我们将转向标准 new/delete:

namespace pmr = boost::container::pmr;

struct tracing_resource : pmr::memory_resource {
    uint64_t n = 0, total_bytes = 0;

    virtual void* do_allocate(std::size_t bytes, std::size_t alignment) override {
        n += 1;
        total_bytes += bytes;
        return pmr::new_delete_resource()->allocate(bytes, alignment);
    }

    virtual void do_deallocate(void* p, std::size_t bytes, std::size_t alignment) override {
        if (p) {
            n -= 1;
            total_bytes -= bytes;
        }
        return pmr::new_delete_resource()->deallocate(p, bytes, alignment);
    }

    virtual bool do_is_equal(const memory_resource& other) const noexcept override {
        return pmr::new_delete_resource()->is_equal(other);
    }
};

我们可以检查 n(分配数量)以及在整个测试代码的各个点分配的总字节数。

一个测试例程

让我们把它放在一起 main,从我们的示踪剂开始:

tracing_resource tracer;

让我们在上面安装一个池化资源:

pmr::unsynchronized_pool_resource res(&tracer);

auto allocations = [&] {
    fmt::print("alloc: #{}, {} bytes, cached = {}\n", tracer.n, tracer.total_bytes, cache_buckets(res));
};
allocations();

这将打印

alloc: #0, 0 bytes, cached = {0, 0, 0, 0, 0, 0, 0, 0, 0}

刚出门。

现在,让我们开始(重新)以各种模式分配一些双端队列:

pmr::deque<X> collection(&res);
auto report = [&] {
    fmt::print("collection = {}\nalloc: #{}, {} bytes, cached = {}\n", collection, tracer.n, tracer.total_bytes, cache_buckets(res));
};

std::vector data1 { X{"1", "eins"}, {"2", "zwei"},  {"3", "drei"}, };
std::vector data2 { X{"4", "vier"},   {"5", "fuenf"}, {"6", "sechs"}, };
std::vector data3 { X{"7", "sieben"}, {"8", "acht"},  {"9", "neun"}, };

auto i = 0;
for (auto const& data : {data1, data2, data3}) {
    for (auto el : data) {
        (i%2)
            ? collection.push_back(el)
            : collection.push_front(el);
    }

    report();

    collection.clear();
    report();
}

这将在容器的不同端附加不同的序列。我们不会 做很多变化,因为当我们也制作字符串时这会变得很有趣 使用池资源)。

现场演示

Live On Compiler Explorer

#include <boost/container/pmr/deque.hpp>
#include <boost/container/pmr/unsynchronized_pool_resource.hpp>

// debug output
#include <range/v3/all.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <iomanip>

namespace pmr = boost::container::pmr;

struct tracing_resource : pmr::memory_resource {
    uint64_t n = 0, total_bytes = 0;

    virtual void* do_allocate(std::size_t bytes, std::size_t alignment) override {
        n += 1;
        total_bytes += bytes;
        return pmr::new_delete_resource()->allocate(bytes, alignment);
    }

    virtual void do_deallocate(void* p, std::size_t bytes, std::size_t alignment) override {
        if (p) {
            n -= 1;
            total_bytes -= bytes;
        }
        return pmr::new_delete_resource()->deallocate(p, bytes, alignment);
    }

    virtual bool do_is_equal(const memory_resource& other) const noexcept override {
        return pmr::new_delete_resource()->is_equal(other);
    }
};

struct X {
    std::string key, value;

    friend std::ostream& operator<<(std::ostream& os, X const& x) {
        return os << "(" << std::quoted(x.key) << ", " << std::quoted(x.value) << ")";
    }
};

auto cache_buckets(pmr::unsynchronized_pool_resource& res) {
    using namespace ::ranges;
    return views::iota(0ull)
        | views::take_exactly(res.pool_count())
        | views::transform([&](auto idx) {
            return res.pool_cached_blocks(idx);
        });
}

int main() {
    tracing_resource tracer;
    {
        pmr::unsynchronized_pool_resource res(&tracer);

        auto allocations = [&] {
            fmt::print("alloc: #{}, {} bytes, cached = {}\n", tracer.n, tracer.total_bytes, cache_buckets(res));
        };
        allocations();

        {
            pmr::deque<X> collection(&res);
            auto report = [&] {
                fmt::print("collection = {}\nalloc: #{}, {} bytes, cached = {}\n", collection, tracer.n, tracer.total_bytes, cache_buckets(res));
            };

            std::vector data1 { X{"1", "eins"}, {"2", "zwei"},  {"3", "drei"}, };
            std::vector data2 { X{"4", "vier"},   {"5", "fuenf"}, {"6", "sechs"}, };
            std::vector data3 { X{"7", "sieben"}, {"8", "acht"},  {"9", "neun"}, };
                            
            auto i = 0;
            for (auto const& data : {data1, data2, data3}) {
                for (auto el : data) {
                    (i%2)
                        ? collection.push_back(el)
                        : collection.push_front(el);
                }

                report();

                collection.clear();
                report();
            }
        }

        allocations();
    }

    fmt::print("alloc: #{}, {} bytes\n", tracer.n, tracer.total_bytes);
}

版画

alloc: #0, 0 bytes, cached = {0, 0, 0, 0, 0, 0, 0, 0, 0}
collection = {("3", "drei"), ("2", "zwei"), ("1", "eins")}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {("6", "sechs"), ("5", "fuenf"), ("4", "vier")}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {("9", "neun"), ("8", "acht"), ("7", "sieben")}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
alloc: #4, 1864 bytes, cached = {0, 0, 1, 0, 0, 3, 0, 0, 0}
alloc: #0, 0 bytes

高级

正如承诺的那样,我们可以让元素类型分配器感知并显示传播:

 #include <boost/container/pmr/string.hpp>
 // ...

 struct X {
     using allocator_type = pmr::polymorphic_allocator<X>;
 
     template<typename K, typename V>
     explicit X(K&& key, V&& value, allocator_type a = {})
         : key(std::forward<K>(key), a), value(std::forward<V>(value), a) {}
 
     pmr::string key, value;
 };

让我们将测试驱动程序修改为 emplace 来自字符串文字的元素:

std::vector data1 { std::pair{"1", "eins"}, {"2", "zwei"},  {"3", "drei"}, };
std::vector data2 { std::pair{"4", "vier"},   {"5", "fuenf"}, {"6", "sechs"}, };
std::vector data3 { std::pair{"7", "sieben"}, {"8", "acht"},  {"9", "neun"}, };
                
auto i = 0;
for (auto const& data : {data1, data2, data3}) {
    for (auto [k,v] : data) {
        (i%2)
            ? collection.emplace_back(k, v)
            : collection.emplace_front(k, v);
    }

为了更好的衡量,我们也改变其中一个嵌套字符串值:

    collection.at(1).value.append(50, '*'); // thwart SSO
    report();

    collection.at(1).value = "sept";
    report();

再次演示Live On Compiler Explorer

#include <boost/container/pmr/deque.hpp>
#include <boost/container/pmr/string.hpp>
#include <boost/container/pmr/unsynchronized_pool_resource.hpp>

// debug output
#include <range/v3/all.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <iomanip>

namespace pmr = boost::container::pmr;

struct tracing_resource : pmr::memory_resource {
    uint64_t n = 0, total_bytes = 0;

    virtual void* do_allocate(std::size_t bytes, std::size_t alignment) override {
        n += 1;
        total_bytes += bytes;
        return pmr::new_delete_resource()->allocate(bytes, alignment);
    }

    virtual void do_deallocate(void* p, std::size_t bytes, std::size_t alignment) override {
        if (p) {
            n -= 1;
            total_bytes -= bytes;
        }
        return pmr::new_delete_resource()->deallocate(p, bytes, alignment);
    }

    virtual bool do_is_equal(const memory_resource& other) const noexcept override {
        return pmr::new_delete_resource()->is_equal(other);
    }
};

struct X {
    using allocator_type = pmr::polymorphic_allocator<X>;

    template<typename K, typename V>
    explicit X(K&& key, V&& value, allocator_type a = {})
        : key(std::forward<K>(key), a), value(std::forward<V>(value), a) {}

    pmr::string key, value;

    friend std::ostream& operator<<(std::ostream& os, X const& x) {
        return os << "(" << std::quoted(x.key.c_str()) << ", " << std::quoted(x.value.c_str()) << ")";
    }
};

auto cache_buckets(pmr::unsynchronized_pool_resource& res) {
    using namespace ::ranges;
    return views::iota(0ull)
        | views::take_exactly(res.pool_count())
        | views::transform([&](auto idx) {
            return res.pool_cached_blocks(idx);
        });
}

int main() {
    tracing_resource tracer;
    {
        pmr::unsynchronized_pool_resource res(&tracer);

        auto allocations = [&] {
            fmt::print("alloc: #{}, {} bytes, cached = {}\n", tracer.n, tracer.total_bytes, cache_buckets(res));
        };
        allocations();

        {
            pmr::deque<X> collection(&res);
            auto report = [&] {
                fmt::print("collection = {}\nalloc: #{}, {} bytes, cached = {}\n", collection, tracer.n, tracer.total_bytes, cache_buckets(res));
            };

            std::vector data1 { std::pair{"1", "eins"}, {"2", "zwei"},  {"3", "drei"}, };
            std::vector data2 { std::pair{"4", "vier"},   {"5", "fuenf"}, {"6", "sechs"}, };
            std::vector data3 { std::pair{"7", "sieben"}, {"8", "acht"},  {"9", "neun"}, };
                            
            auto i = 0;
            for (auto const& data : {data1, data2, data3}) {
                for (auto [k,v] : data) {
                    (i%2)
                        ? collection.emplace_back(k, v)
                        : collection.emplace_front(k, v);
                }

                report();

                collection.at(1).value.append(50, '*'); // thwart SSO
                report();

                collection.at(1).value = "sept";
                report();

                collection.clear();
                report();
            }
        }

        allocations();
    }

    fmt::print("alloc: #{}, {} bytes\n", tracer.n, tracer.total_bytes);
}

打印:

alloc: #0, 0 bytes, cached = {0, 0, 0, 0, 0, 0, 0, 0, 0}
collection = {("3", "drei"), ("2", "zwei"), ("1", "eins")}
alloc: #4, 1864 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {("3", "drei"), ("2", "zwei**************************************************"), ("1", "eins")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {("3", "drei"), ("2", "sept"), ("1", "eins")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 2, 0, 0, 0}
collection = {("6", "sechs"), ("5", "fuenf"), ("4", "vier")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 2, 0, 0, 0}
collection = {("6", "sechs"), ("5", "fuenf**************************************************"), ("4", "vier")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {("6", "sechs"), ("5", "sept"), ("4", "vier")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 2, 0, 0, 0}
collection = {}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 2, 0, 0, 0}
collection = {("9", "neun"), ("8", "acht"), ("7", "sieben")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 1, 0, 0, 0}
collection = {("9", "neun"), ("8", "acht**************************************************"), ("7", "sieben")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {("9", "neun"), ("8", "sept"), ("7", "sieben")}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 0, 0, 1, 0, 0, 0}
collection = {}
alloc: #5, 2008 bytes, cached = {0, 0, 0, 1, 0, 2, 0, 0, 0}
alloc: #5, 2008 bytes, cached = {0, 0, 1, 1, 0, 3, 0, 0, 0}
alloc: #0, 0 bytes

结论

虽然我选择多态分配器是因为它们默认支持 scoped_allocator_adaptor<> 式传播,但以上所有分配器都可以使用静态类型分配器创建。

它确实表明,如果您使用池分配器,双端队列行为将变为池化。

Aside: Pool allocators exist that are able to forgo cleanup, which can be valid in some scenarios, e.g. where the entire memory pool lives on the stack anyways. This is a common allocation optimization technique, allowing large numbers of deallocations/destructions to be skipped.

正在处理您列表中的更多要点:

  1. Q. 如何为 boost::container::deque 指定块分配器 (方块,不是物品!)?

    A. 我认为分配器本质上总是被块调用, 因为双端队列的工作方式。

  2. Q.如果有办法,那么这样的规范是否支持分配器 与状态?

    A.标准库和Boost Container应该都支持 这些天有状态分配器。如有疑问,Boost Container 有您的 返回。

  3. Q. 如果是,那么 above-mentioned 分配器会用吗?

    A.我没仔细看,不过你可以放在同一个里test-bed 找出

  4. Q.毕竟如果不是这样,怎么弄一个不会的双端队列 释放至少一个它的释放块,并在以后重新使用它 需要新区块吗?

    A. 见上。我不确定我是否理解“在 至少一个”子句,但我确实注意到 Boost 的双端队列实现确实可以 “私有地图”分配——大概是为了一些块记帐开销 ——它一直存在直到 deque 对象被销毁。这个配置 不会在(默认)构造时发生,但会在之后发生。

不能单独离开 - 我想证明它可以用有状态的 statically-typed 分配器来完成。

这建立在

  • Boost 容器的 basic_stringdeque

  • 提升容器的 scoped_allocator_adaptor 以在 uses_allocator-construction

    上获得分配器传播
  • Boost Pool 的 object_pool 用于使用 segregated storage, optional block size hinting and ordered allocations with any UserAllocator

    的范围池
  • 最后,我使用了我自己的(惊喜!我 )状态分配器来补充 object_pool¹。它是 non_boost 命名空间中的分配器。

    Note that if you don't mind singleton pools, you can just use Boist's own pool allocators. This is obviously recommended because mine is not maintained by Boost.


Note: I sacrificed the pooling statistics for the sake of cleanliness (it's way messier to add these to the allocator than it is to decorate a polymorphic memory_resource), but I guess the profiler knows best in the end

Live On Compiler Explorer

#include <boost/container/deque.hpp>
#include <boost/container/string.hpp>
#include <boost/container/scoped_allocator.hpp>
#include <boost/pool/pool_alloc.hpp>

// debug output
#include <range/v3/all.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <iomanip>

namespace bc = boost::container;

namespace non_boost {
    template <typename T, typename UserAllocator = boost::default_user_allocator_new_delete>
    class fast_pool_allocator
    {
      public:
        typedef T value_type;
        typedef UserAllocator user_allocator;

        typedef value_type * pointer;
        typedef const value_type * const_pointer;
        typedef value_type & reference;
        typedef const value_type & const_reference;
        typedef boost::pool<UserAllocator> pool_type;
        typedef typename pool_type::size_type       size_type;
        typedef typename pool_type::difference_type difference_type;

        template <typename U>
        struct rebind {
            typedef fast_pool_allocator<U, UserAllocator> other;
        };

        pool_type* _ref;
      public:
        fast_pool_allocator(pool_type& ref) : _ref(&ref) { }

        fast_pool_allocator(fast_pool_allocator const&) = default;
        fast_pool_allocator& operator=(fast_pool_allocator const&) = default;

        // Not explicit, mimicking std::allocator [20.4.1]
        template <typename U>
        fast_pool_allocator(const fast_pool_allocator<U, UserAllocator> & other) : _ref(other._ref)
        { }

        // Default destructor used.
        static pointer address(reference r)                     { return &r;                                      } 
        static const_pointer address(const_reference s)         { return &s;                                      } 
        static size_type max_size()                             { return (std::numeric_limits<size_type>::max)(); } 
        void construct(const pointer ptr, const value_type & t) { new (ptr) T(t);                                 } 
        void destroy(const pointer ptr)                         { ptr->~T();                                      } 

        bool operator==(fast_pool_allocator const& rhs) const { return _ref == rhs._ref; }
        bool operator!=(fast_pool_allocator const& rhs) const { return _ref != rhs._ref; }

        pointer allocate(const size_type n)
        {
            const pointer ret = (n == 1) 
                ? static_cast<pointer>( (_ref->malloc)() ) 
                : static_cast<pointer>( _ref->ordered_malloc(n) );
            if (ret == 0)
                boost::throw_exception(std::bad_alloc());
            return ret;
        }

        pointer allocate(const size_type n, const void * const) { return allocate(n); }
        pointer allocate()
        {
            const pointer ret = static_cast<pointer>( (_ref->malloc)() );
            if (ret == 0)
                boost::throw_exception(std::bad_alloc());
            return ret;
        }
        void deallocate(const pointer ptr, const size_type n)
        {
#ifdef BOOST_NO_PROPER_STL_DEALLOCATE
            if (ptr == 0 || n == 0)
                return;
#endif
            if (n == 1)
                (_ref->free)(ptr);
            else
                (_ref->free)(ptr, n);
        }
        void deallocate(const pointer ptr) { (_ref->free)(ptr); }
    };

    //Specialization of fast_pool_allocator<void> required to make the allocator standard-conforming.
    template<typename UserAllocator>
    class fast_pool_allocator<void, UserAllocator> {
    public:
        typedef void*       pointer;
        typedef const void* const_pointer;
        typedef void        value_type;

        template <class U> struct rebind {
            typedef fast_pool_allocator<U, UserAllocator> other;
        };
    };
}

template <typename T> using Alloc
    = bc::scoped_allocator_adaptor< non_boost::fast_pool_allocator<T> >;

struct X {
    using allocator_type = Alloc<X>;

    template<typename K, typename V>
    explicit X(K&& key, V&& value, allocator_type a)
        : key(std::forward<K>(key), a), value(std::forward<V>(value), a) {}

    bc::basic_string<char, std::char_traits<char>, Alloc<char> > key, value;

    friend std::ostream& operator<<(std::ostream& os, X const& x) {
        return os << "(" << std::quoted(x.key.c_str()) << ", " << std::quoted(x.value.c_str()) << ")";
    }
};

int main() {
    boost::pool<boost::default_user_allocator_new_delete> _pool { sizeof(X) };
    Alloc<X> alloc { _pool };

    bc::deque<X, Alloc<X> > collection(alloc);
    auto dump = [&] { fmt::print("collection = {}\n", collection); };

    std::vector data1 { std::pair{"1", "eins"}, {"2", "zwei"},  {"3", "drei"}, };
    std::vector data2 { std::pair{"4", "vier"},   {"5", "fuenf"}, {"6", "sechs"}, };
    std::vector data3 { std::pair{"7", "sieben"}, {"8", "acht"},  {"9", "neun"}, };

    auto i = 0;
    for (auto const& data : {data1, data2, data3}) {
        for (auto [k,v] : data) {
            (i%2)
                ? collection.emplace_back(k, v)
                : collection.emplace_front(k, v);
        }

        dump();

        collection.at(1).value.append(50, '*'); // thwart SSO
        dump();

        collection.at(1).value = "sept";
        dump();

        collection.clear();
        dump();
    }
}

¹(参见 and Is there a BOOST pool fixed-sized allocator?