如何通过来自两个 512 字节数组的交替加载来预测直接映射缓存的行为

Question

给定这段代码：

int x[2][128];
int i;
int sum=0;

for(i=0; i<128; i++){
   sum += x[0][i] * x[1][i];
}
Assuming we execute this under the following conditions:

sizeof(int) = 4.

Array x begins at memory address 0x0 and is stored in row-major order.

In each case below, the cache is initially empty.

The only memory accesses are to the entries of the array x. All other variables are stored in registers.

Given these assumptions, estimate the miss rates for the following cases: Assume the cache is 512 bytes, direct-mapped, with 16-byte cache blocks.

鉴于此信息，我知道此缓存中有 32 组（从 512/16 获得）。所以第一组加载了 x[0][i] 和 4 个整数。

但是对于第二部分x[1][i]，我怎么知道这里加载的值是否会覆盖 x[0][i], x[0][i+1], x[0][i+2], x[0][i+3] 的第一次加载？或者将 x[1][i]、x[1][i+1]、x[1][i+2]、x[1][i+3] 存储在与第一次加载不同的集合中x[0][i] 的？我对这段代码如何加载到缓存中感到困惑。
这个的失误率是多少？

如有任何帮助，我们将不胜感激:)

Answer 1

一般来说，仅仅通过查看 C 代码是不可能预测缓存系统中会发生什么的。为此，您至少需要查看生成的机器代码。

请记住，只要最终结果和副作用相同，编译器就可以使用各种优化技巧。

所以原则上，一个聪明的编译器可以将代码变成：

for(i=0; i<128; i += 4){
   regA = x[0][i];
   regB = x[0][i+1];
   regC = x[0][i+2];
   regD = x[0][i+3];

   sum += regA * x[1][i];
   sum += regB * x[1][i+1];
   sum += regC * x[1][i+2];
   sum += regD * x[1][i+3];
}

这将完全影响缓存的使用。除此之外，可能还有硬件级别的优化技巧，您甚至无法从机器代码中看到。

无论如何 - 如果我们假设一个“直接非优化”编译那么你每次都会有 2 次缓存未命中 sum += x[0][i] * x[1][i];

原因是 x[0][i] 和 x[1][i] 之间的距离是 128 * 4 = 512，这正是缓存大小。因此，来自 x[0][i] 和 x[1][i] 的数据将使用相同的缓存行，这意味着第一次缓存未命中后读取的数据将被第二次缓存未命中后读取的数据覆盖。

所以根本不会有任何缓存命中。您将得到 2 * 128 = 256 次未命中和 100% 的未命中率。

如何通过来自两个 512 字节数组的交替加载来预测直接映射缓存的行为

how to predict the behaviour of a direct-mapped cache with alternating loads from two 512 byte arrays

c

caching

cpu-architecture

cpu-cache