C++：成对访问值是否比访问数组元素更有效？

Question

假设我们有一个双精度数组 x 和一个索引数组 y，我们希望根据 x 中的相应值对这些索引进行排序（因此，排序 [i in y] 以 x[i] 为键）。

然后我们可以创建一个对数组，一个组件是键值，一个组件是索引，例如做这样的事情：

boost::sort::spreadsort::float_sort(sortdata, sortdata + n,
    [](const std::pair<double, int> &a, const unsigned offset) -> boost::int64_t {
        return boost::sort::spreadsort::float_mem_cast<double, boost::int64_t>(a.first) >> offset;
    },
    [](const std::pair<double, int> &a, const std::pair<double, int> &b) -> bool {
        return a.first < b.first;
    });

这会占用相当多的内存，所以我们可以省略创建这个对数组并直接使用这样的数据：

boost::sort::spreadsort::float_sort(y, y + n,
    [x](const int a, const unsigned offset) -> boost::int64_t {
        return boost::sort::spreadsort::float_mem_cast<double, boost::int64_t>(x[a]) >> offset;
    },
    [x](const int a, const int b) -> bool {
        return x[a] < x[b];
    });

现在，当在非常大的数据（假设 50000000 个条目）上使用它时，第二种方法花费的时间是第一种方法的两倍多。据我所知，std::pair 就是 struct。那么，数组访问真的比结构访问效率低得多吗？或者，我在这里做错了吗？

这里是比较的完整示例：

#include <cstdlib>
#include <ctime>
#include <boost/sort/spreadsort/spreadsort.hpp>
#include <iostream>

int main() {
    int n = 50000000;
    double *x = new double[n];
    int *y = new int[n];
    std::pair<double, int> *sortdata = new std::pair<double, int> [n];
    for (int i=0; i < n; i++) {
        x[i] = ((double) std::rand()) / ((double) std::rand());
        sortdata[i].first = x[i];
        y[i] = i;
        sortdata[i].second = i;
    }

    std::time_t t = std::time(0);
    boost::sort::spreadsort::float_sort(sortdata, sortdata + n,
        [](const std::pair<double, int> &a, const unsigned offset) -> boost::int64_t {
            return boost::sort::spreadsort::float_mem_cast<double, boost::int64_t>(a.first) >> offset;
        },
        [](const std::pair<double, int> &a, const std::pair<double, int> &b) -> bool {
            return a.first < b.first;
        });

    std::cout << std::time(0)-t << "\n";

    t = std::time(0);
    boost::sort::spreadsort::float_sort(y, y + n,
        [x](const int a, const unsigned offset) -> boost::int64_t {
            return boost::sort::spreadsort::float_mem_cast<double, boost::int64_t>(x[a]) >> offset;
        },
        [x](const int a, const int b) -> bool {
            return x[a] < x[b];
        });

    std::cout << std::time(0)-t << "\n";
}

运行这与 g++ -O2 在我的系统上第一次给出 3 秒，最后一次给出 9 秒。

Answer 1

不是专家，但我认为是这样的：

无论使用何种编程语言，在内存中跳来跳去都是一项代价高昂的操作。这与您的 CPU.

的缓存架构有关

数据成对交错存储在内存中。没有 class 边界或类似的东西。它看起来像这样：

value0 index0 value1 index1 value2 index2 ...

现在，当您将数据存储在两个数组中时，您的数据将如下所示：

value0 value1 value2 .........  index0 index1 index2 .....

好的，现在让我们看看当排序算法试图弄清楚索引 46 处的数据应该在索引 47 处的数据之前还是之后时会发生什么：

成对的情况下：

1. ask ram for value46 (expensive!)
   because of how cachelines work ~ 64bytes of data will be pulled. 
   that is: value46 index46 value47 index47 are pulled, maybe a bit more.
2. ask ram for value47 (cheap, it's already cached)

在两个数组的情况下：

1. ask ram for index46 (expensive)
   this will pull index46,index47,...
2. ask ram for index47 (cheap)
3. ask ram for x[index46] (expensive)
   this will pull x[index46],x[index46+1...]
4. ask ram for x[index47] (should be next to x[index46]? not sure... could be cheap, could be expensive)

好吧，我不确定最后一次内存获取，但关键是你在内存中跳来跳去大约是原来的两到三倍，我很确定这就是你测量的原因大约两倍的运行时间。

这里是最后一个例子来说明这一点：

int N = 100000; 
float data[N] = {... random data...};
int idx_1[N] = {0,1,....};
int idx_2[N] = {... random permutation of 0,...N-1};

// this will be very fast. 
// the cache predictor always gets it right. 
float sum_1 = 0; 
for(int i = 0; i < N; i++) sum_1 += data[idx_1[i]];

// this will be very slow. 
// the cache predictor always gets it wrong. 
float sum_2 = 0; 
for(int i = 0; i < N; i++) sum_2 += data[idx_2[i]];

C++：成对访问值是否比访问数组元素更有效？

C++: Is accessing values in pairs so much more efficient than accessing array elements?

c++

arrays

performance

struct

std-pair