应用记忆会使 golom 序列变慢

Question

我正在尝试使用 C++ 进行记忆化，我正在尝试使用“golom 序列”做一个示例

int main(int argc, char* argv[])
{
    std::unordered_map<int, int> hashTable;

    
    int value = 7;
    
    auto start = std::chrono::high_resolution_clock::now(); 
    std::cout << golomS(4, hashTable) << std::endl;
    auto stop = std::chrono::high_resolution_clock::now();
    
    auto start1 = std::chrono::high_resolution_clock::now();
    std::cout << golom(4) << std::endl;;
    auto stop1 = std::chrono::high_resolution_clock::now();

    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
    auto duration1 = std::chrono::duration_cast<std::chrono::microseconds>(stop1 - start1);

    std::cout << "Time taken by function 1: "
           << duration.count() << " microseconds" << std::endl;

    
    std::cout << "Time taken by function 2: "
          << duration1.count() << " microseconds" << std::endl;
    
    return 0;
}

int golom(int n)
{
    if (n == 1) {return 1;}
    return 1 + golom(n - golom(golom(n - 1)));
}

int golomS(int n, std::unordered_map<int, int> golomb)
{
    if(n == 1)
    {
        return 1; 
    }
    if(!golomb[n])
    {
        golomb[n] = 1 + golomS(n - golomS(golomS(n - 1, golomb), golomb), golomb); 
    }
    return golomb[n];
}

作为输出我得到这个：

4

函数 1 所用时间：687 微秒 //This is Golom S

函数 2 耗时：242 微秒 //This is Golom

GolomS 不应该通过记忆更快吗？我试着绕过调试器，但我不确定有效的“慢”是从哪里来的。

我的问题是：如何改变我制作 golomS 的方法，使其比 golom 更快。谢谢你。 -亚当

Answer 1

int golomS(int n, std::unordered_map<int, int> golomb)
{
    if(n == 1)
    {
        return 1; 
    }
    if(!golomb[n])
    {
        golomb[n] = 1 + golomS(n - golomS(golomS(n - 1, golomb), golomb), golomb); 
    }
    return golomb[n];
}

在您的 goloms 函数中，您将 golomb 作为值而不是引用传递。这意味着每次调用 goloms 时，编译器都会制作 golomb 的副本并在超出范围时销毁该副本，而不更改实际的 golomb 值。

您应该改为通过引用传递。

int golomS(int n, std::unordered_map<int, int>& golomb)
{
    if(n == 1)
    {
        return 1; 
    }
    if(!golomb[n])
    {
        golomb[n] = 1 + golomS(n - golomS(golomS(n - 1, golomb), golomb), golomb); 
    }
    return golomb[n];
}

Answer 2

您的代码中有两个问题。第一个主要问题是您按值传递无序映射。您必须通过引用传递它以使其更快，以避免在每次函数调用时复制整个地图。此外，如果您按值传递映射，则 return 不会保留新计算的值，但如果您按引用传递它，则新值将为调用函数带来好处。

第二个重要问题是您使用非常小的值 4 作为函数的输入来测量时间，4 需要很少的时间来计算，您必须使用更大的值，例如50，因此需要更多的时间来计算，从而给出更准确的时间测量。

与第二个函数相比，您的代码的固定版本已经大大改进了第一个函数的计时（对于输入 50）：

13
13
Time taken by function 1: 110 microseconds
Time taken by function 2: 118345 microseconds

固定码：

Try it online!

#include <unordered_map>
#include <iostream>
#include <chrono>

int golom(int n)
{
    if (n == 1) {return 1;}
    return 1 + golom(n - golom(golom(n - 1)));
}

int golomS(int n, std::unordered_map<int, int> & golomb)
{
    if(n == 1)
        return 1; 
    if(golomb.find(n) == golomb.end())
        golomb[n] = 1 + golomS(n - golomS(golomS(n - 1, golomb), golomb), golomb); 
    return golomb[n];
}

int main(int argc, char* argv[])
{
    std::unordered_map<int, int> hashTable;
    
    auto start = std::chrono::high_resolution_clock::now(); 
    std::cout << golomS(50, hashTable) << std::endl;
    auto stop = std::chrono::high_resolution_clock::now();
    
    auto start1 = std::chrono::high_resolution_clock::now();
    std::cout << golom(50) << std::endl;;
    auto stop1 = std::chrono::high_resolution_clock::now();

    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
    auto duration1 = std::chrono::duration_cast<std::chrono::microseconds>(stop1 - start1);

    std::cout << "Time taken by function 1: "
           << duration.count() << " microseconds" << std::endl;
    
    std::cout << "Time taken by function 2: "
          << duration1.count() << " microseconds" << std::endl;
    
    return 0;
}

Answer 3

除了其他答案之外，我想补充一点，这真的可以从适当的基准测试中受益。

为了获得可靠的结果，您希望运行多次测试，并采取措施确保内存缓存和其他系统级恶作剧不会干扰结果。

值得庆幸的是，有一些库可以为我们处理大部分这些复杂问题。例如，下面是使用 google 的 Benchmark 库的更好的代码基准测试：

N.B。已整合其他答案提出的修复。

#include <chrono>

int golom(int n)
{
    if (n == 1) {return 1;}
    return 1 + golom(n - golom(golom(n - 1)));
}

int golomS(int n, std::unordered_map<int, int>& golomb)
{
    if(n == 1)
    {
        return 1; 
    }
    if(!golomb[n])
    {
        golomb[n] = 1 + golomS(n - golomS(golomS(n - 1, golomb), golomb), golomb); 
    }
    return golomb[n];
}


static void TestGolomb(benchmark::State& state) {
  for (auto _ : state) {
    benchmark::DoNotOptimize(golom(state.range(0)));
  }
}
BENCHMARK(TestGolomb)->DenseRange(1, 17, 2);

static void TestGolombS(benchmark::State& state) {
  for (auto _ : state) {
    // Make sure we always start from a fresh map
    std::unordered_map<int, int> golomb;

    // Ignore construction and destruction of the map
    auto start = std::chrono::high_resolution_clock::now();
    benchmark::DoNotOptimize(golomS(state.range(0), golomb));
    auto end = std::chrono::high_resolution_clock::now();

    auto elapsed_seconds =
      std::chrono::duration_cast<std::chrono::duration<double>>(
        end - start);

    state.SetIterationTime(elapsed_seconds.count());
  }
}
BENCHMARK(TestGolombS)->DenseRange(1, 17, 2);

这为我们提供了以下配置文件，表明盈亏平衡点在 14-15 左右：

见 quick-bench

Answer 4

很好。请尝试并继续了解记忆。

我再给一个方法。您还可以在 DP table 中“记忆”值。 DP喜欢动态规划。

如果您追求速度并且能够承受大内存消耗，那么这是最快和最有效的解决方案。对该序列使用递归基本上是一个坏主意，会导致代码非常缓慢和潜在的堆栈溢出。

在 DP table 中将递归转换为带有记忆的迭代解决方案，会给你一个非常快的结果。即使是大数字。启用编译器优化后，您可以计算完整的序列，例如高达 100000 ultrafast。

见下文：

#include <iostream>
#include <array>
#include <cstdint>

// We will calculate a Golomb sequence up to the below specified element
constexpr size_t MaxValues = 100000;

using MyDataType = uint_fast16_t;
using GolombSequence = std::array<MyDataType, MaxValues>;

// This is the Golomb Sequence
GolombSequence golombSequence{};

// Iterative Function to calculate a Golomb sequence using a DP table
void createGolombSequence() {

    golombSequence[1] = 1;
    for (int i = 2; i < MaxValues; ++i) {
        golombSequence[i] = 1 + golombSequence[i - golombSequence[golombSequence[i - 1]]];
    }
}

// Test
int main() {
   createGolombSequence();

   std::cout << golombSequence[777] << '\n';
   std::cout << golombSequence[7777] << '\n';
   std::cout << golombSequence[77777] << '\n';
}

我们甚至可以更进一步，使用“编译时记忆”，也就是说，我们会在编译时计算出完整的序列。所以，让编译器来做吧。

这当然会导致最快的解决方案，但缺点是内存消耗高，并且受编译器能力的限制。

请看下面。所有 Golomb 值计算都在编译时完成。在运行时，我们只使用超快索引操作：

#include <iostream>
#include <array>
#include <cstdint>

// We will calculate a Golomb sequence up to the below specified element
constexpr size_t MaxValues = 100000u;
using MyDataType = uint_fast16_t;
using GolombSequence = std::array<MyDataType, MaxValues>;

// Iterative Function to calculate a Golomb sequence using a DP table
constexpr GolombSequence createGolombSequence() {

    GolombSequence golombSequence{};
    golombSequence[1] = 1;
    for (int i = 2; i < MaxValues; ++i) {
        golombSequence[i] = 1 + golombSequence[i - golombSequence[golombSequence[i - 1]]];
    }
    return golombSequence;
}
// This is the Golomb Sequence
constexpr GolombSequence golombSequence = createGolombSequence();

// Test
int main() {

   std::cout << golombSequence[777] << '\n';
   std::cout << golombSequence[7777] << '\n';
   std::cout << golombSequence[77777] << '\n';
}

我将我的代码添加到 Franks Benchmark。结果大大优于其他一切。 0ms处的绿线用于上述算法。

使用基准代码

#include <chrono>
#include <iostream>
#include <array>
#include <cstdint>

// We will calculate a Golomb sequence up to the below specified element
constexpr size_t MaxValues = 10000u;
using MyDataType = uint_fast16_t;
using GolombSequence = std::array<MyDataType, MaxValues>;

// Iterative Function to calculate a Golomb sequence using a DP table
constexpr GolombSequence createGolombSequence() {

    GolombSequence golombSequence{};
    golombSequence[1] = 1;
    for (int i = 2; i < MaxValues; ++i) {
        golombSequence[i] = 1 + golombSequence[i - golombSequence[golombSequence[i - 1]]];
    }
    return golombSequence;
}
// This is the Golomb Sequence
constexpr GolombSequence golombSequence = createGolombSequence();

inline int golombFast(size_t i) { return golombSequence[i]; };


int golom(int n)
{
    if (n == 1) {return 1;}
    return 1 + golom(n - golom(golom(n - 1)));
}

int golomS(int n, std::unordered_map<int, int>& golomb)
{
    if(n == 1)
    {
        return 1; 
    }
    if(!golomb[n])
    {
        golomb[n] = 1 + golomS(n - golomS(golomS(n - 1, golomb), golomb), golomb); 
    }
    return golomb[n];
}

static void TestGolombFast(benchmark::State& state) {
  for (auto _ : state) {
    benchmark::DoNotOptimize(golombFast(state.range(0)));
  }
}
BENCHMARK(TestGolombFast)->DenseRange(1, 17, 2);



static void TestGolomb(benchmark::State& state) {
  for (auto _ : state) {
    benchmark::DoNotOptimize(golom(state.range(0)));
  }
}
BENCHMARK(TestGolomb)->DenseRange(1, 17, 2);

static void TestGolombS(benchmark::State& state) {
for (auto _ : state) {
    std::unordered_map<int, int> golomb;

    // Be extra-generous, ignore construction and destruction of the set
    auto start = std::chrono::high_resolution_clock::now();
    benchmark::DoNotOptimize(golomS(state.range(0), golomb));
    auto end = std::chrono::high_resolution_clock::now();

    auto elapsed_seconds =
      std::chrono::duration_cast<std::chrono::duration<double>>(
        end - start);

    state.SetIterationTime(elapsed_seconds.count());
  }
}
BENCHMARK(TestGolombS)->DenseRange(1, 17, 2);

应用记忆会使 golom 序列变慢

Applying memoization makes golom sequence slower

c++

recursion

memoization