如何提高此 MIPS 代码的缓存性能

How to improve cache performance on this MIPS code

我正在使用名为 MARS 4.5 的模拟器来提高此代码的缓存性能。这是一个汇编程序的一个子部分,该程序使用埃拉托色尼筛算法计算素数。

出于某种原因,sw(存储词)的缓存命中率为 25%,而程序的其余部分在其当前状态下平均约为 50%。我试过重新安排一些东西,但我无法弄清楚是什么导致了这个瓶颈。为了提高这个缓存命中率需要做什么?

inner:  add $t2, $s2, 0 # save the bottom of stack address to $t2
        mul $t3, $t1, 4 # calculate the number of bytes to jump over
        sub $t2, $t2, $t3   # subtract them from bottom of stack address
        add $t2, $t2, 8 # add 2 words - we started counting at 2!

        sw  $s0, ($t2)  # store 1's -> it's not a prime number!

        add $t1, $t1, $t0   # do this for every multiple of $t0
        bgt $t1, $t9, outer # every multiple done? go back to outer loop

        j   inner       # some multiples left? go back to inner loop

我能够通过修改程序来存储字节而不是单词来解决这个问题。这增加了缓存中存储块的数量,从而提高了命中率。

inner:  add $t2, $s2, 0 # save the bottom of stack address to $t2
    addi $t3, $t1, 1 # add one byte
    sub $t2, $t2, $t3   # subtract them from bottom of stack address
    add $t2, $t2, 2 # add 2 bytes - we started counting at 2!

    sb  $s0, ($t2)  # store 1's -> it's not a prime number!

    add $t1, $t1, $t0   # do this for every multiple of $t0
    bgt $t1, $t9, outer # every multiple done? go back to outer loop

    j   inner       # some multiples left? go back to inner loop