堆栈内存实现在火箭芯片的凿子中无法正常工作
Stack memory implementation not working properly in chisel for rocket chip
我一直在尝试修改火箭核心的 rocc 接口,目前我已经将 rocc 接口修改为用作暂存器,我们可以从中使用 custom0 指令加载和存储数据。当我尝试将数据推送和弹出到我在凿子中创建并在我的暂存器中实例化的堆栈内存中时,我遇到了一个问题。我使用具有不同功能字段值的相同 custom0 指令来压入和弹出堆栈。
对于暂存器和我的堆栈,chisel 中的代码如下所示
class Comm_Scratchpad(n: Int = 8)(implicit p: Parameters) extends RoCC()(p) {
val Stack_Lo = Module(new Stack_Snd(64) )
val Scratchpad_Loc = Mem(UInt(width = xLen), n:Int)
val busy = Reg(init=Vec(Bool(false), n))
val cmd = Queue(io.cmd) //wired to decoupled command coming into rocc interface from rocket core
val funct = cmd.bits.inst.funct //Function to decide what operation to perform
val addr = cmd.bits.inst.rs2(log2Up(n)-1,0) //converts address specified by user(0,1,2,3) into address that can be used for scratchpad
val doWrite = funct === UInt(0) //Used to check whether user wants to do a write derieves its value from funct
val doRead = funct === UInt(1) //Ignored for now as every operation is by default performing a read
val doPush = funct === UInt(2) //Funct will be used to load to lifo with the same custom instruction
val doPop = funct === UInt(3) //funct will be used to load back from lifo
Stack_Lo.io.en := Mux((doPush || doPop), Bool(true), Bool(false))
Stack_Lo.io.push := Mux(doPush, Bool(true), Bool(false))
Stack_Lo.io.pop := Mux(doPop, Bool(true), Bool(false))
// datapath
val data_in = cmd.bits.rs1
val wdata = data_in
val rdata = Mux(doPop, Stack_Lo.io.dataOut, Scratchpad_Loc(addr))
Stack_Lo.io.dataIn := Mux(doPush, data_in, UInt(0))
when (cmd.fire() && (doWrite)) {
Scratchpad_Loc(addr) := wdata
}
val doResp = cmd.bits.inst.xd
val stallReg = busy(addr)
//val stallLoad = doLoad && !io.mem.req.ready
val stallResp = doResp && !io.resp.ready && (doPush || doPop)
cmd.ready := !stallReg && !stallResp //removed stall load as we are not loading from memory
// command resolved if no stalls AND not issuing a load that will need a request
// PROC RESPONSE INTERFACE
io.resp.valid := cmd.valid && doResp && !stallReg //&& !stallLoad
// valid response if valid command, need a response, and no stalls
io.resp.bits.rd := cmd.bits.inst.rd
// Must respond with the appropriate tag or undefined behavior
io.resp.bits.data := rdata
// Semantics is to always send out prior accumulator register value
io.busy := cmd.valid || busy.reduce(_||_)
// Be busy when have pending memory requests or committed possibility of pending requests
io.interrupt := Bool(false)
// Set this true to trigger an interrupt on the processor (please refer to supervisor documentation)
}
class Stack_IO (implicit p: Parameters) extends CoreBundle
{
val dataIn = UInt(INPUT, 64)
val dataOut = UInt(OUTPUT, 64)
val push = Bool(INPUT)
val pop = Bool(INPUT)
val en = Bool(INPUT)
}
class Stack_Snd(depth: Int) (implicit p: Parameters) extends CoreModule {
val io = new Stack_IO;
// declare the memory for the stack
val stack_mem =
Mem(UInt(width = 64), depth)
val sp = Reg(init = UInt(0, width = log2Up(depth)))
val dataOut = Reg(init = UInt(0, width = 64))
// Push condition - make sure stack isn't full
when(io.en && io.push && (sp != UInt(depth-1))) {
stack_mem(sp) := io.dataIn
sp := sp + UInt(1)
}
// Pop condition - make sure the stack isn't empty
.elsewhen(io.en && io.pop && (sp > UInt(0))) {
dataOut := stack_mem(sp - UInt(1))
sp := sp - UInt(1)
}
io.dataOut := dataOut
}
我在前端服务器中执行的C代码如下。
// The following is a RISC-V program to test the functionality of the
// dummy RoCC accelerator.
// Compile with riscv64-unknown-elf-gcc dummy_rocc_test.c
// Run with spike --extension=dummy_rocc pk a.out
#include <assert.h>
#include <stdio.h>
#include <stdint.h>
int main() {
uint64_t x = 1, y = 456, z = 0, a=2, b=3, c=4, d=5, e=6;
asm volatile ("custom0 x0, %0, 0, 2" : : "r"(x));
asm volatile ("custom0 x0, %0, 1, 2" : : "r"(a));
asm volatile ("custom0 x0, %0, 2, 2" : : "r"(b));
asm volatile ("custom0 x0, %0, 3, 2" : : "r"(c));
asm volatile ("custom0 x0, %0, 4, 2" : : "r"(d));
asm volatile ("custom0 x0, %0, 5, 2" : : "r"(e));
asm volatile ("custom0 %0, x0, 0, 3" : "=r"(z));
printf("The popped value of z 0 is:- %d \n",z);
asm volatile ("custom0 %0, x0, 1, 3" : "=r"(z));
printf("The popped value of z 1 is:- %d \n",z);
asm volatile ("custom0 %0, x0, 2, 3" : "=r"(z));
printf("The popped value of z 2 is:- %d \n",z);
asm volatile ("custom0 %0, x0, 3, 3" : "=r"(z));
printf("The popped value of z 3 is:- %d \n",z);
asm volatile ("custom0 %0, x0, 4, 3" : "=r"(z));
printf("The popped value of z 4 is:- %d \n",z);
asm volatile ("custom0 %0, x0, 5, 3" : "=r"(z));
printf("The popped value of z 5 is:- %d \n",z);
printf("success!\n");
}
所以你可以看到我正在尝试从 1-6 推送一系列数字,但是当我 运行 zed 板上的这段代码时,这就是我得到的输出。
root@zynq:~# ./fesvr-zynq pk /sdcard/Custom\ elfs/rocc_fifo
The popped value of z 0 is:- 1
The popped value of z 1 is:- 0
The popped value of z 2 is:- 2
The popped value of z 3 is:- 2
The popped value of z 4 is:- 2
The popped value of z 5 is:- 2
理想情况下它应该弹出 6,5,4,3,2,1
我解决了这个问题,当堆栈的推送信号为高电平时,弹出信号必须保持低电平,反之亦然,当我们在弹出信号高电平的同时推送时,我们几乎同时弹出值,因此输出与上面日志中的一样。我花了很长时间调试,因此迟了重新发布
我一直在尝试修改火箭核心的 rocc 接口,目前我已经将 rocc 接口修改为用作暂存器,我们可以从中使用 custom0 指令加载和存储数据。当我尝试将数据推送和弹出到我在凿子中创建并在我的暂存器中实例化的堆栈内存中时,我遇到了一个问题。我使用具有不同功能字段值的相同 custom0 指令来压入和弹出堆栈。
对于暂存器和我的堆栈,chisel 中的代码如下所示
class Comm_Scratchpad(n: Int = 8)(implicit p: Parameters) extends RoCC()(p) {
val Stack_Lo = Module(new Stack_Snd(64) )
val Scratchpad_Loc = Mem(UInt(width = xLen), n:Int)
val busy = Reg(init=Vec(Bool(false), n))
val cmd = Queue(io.cmd) //wired to decoupled command coming into rocc interface from rocket core
val funct = cmd.bits.inst.funct //Function to decide what operation to perform
val addr = cmd.bits.inst.rs2(log2Up(n)-1,0) //converts address specified by user(0,1,2,3) into address that can be used for scratchpad
val doWrite = funct === UInt(0) //Used to check whether user wants to do a write derieves its value from funct
val doRead = funct === UInt(1) //Ignored for now as every operation is by default performing a read
val doPush = funct === UInt(2) //Funct will be used to load to lifo with the same custom instruction
val doPop = funct === UInt(3) //funct will be used to load back from lifo
Stack_Lo.io.en := Mux((doPush || doPop), Bool(true), Bool(false))
Stack_Lo.io.push := Mux(doPush, Bool(true), Bool(false))
Stack_Lo.io.pop := Mux(doPop, Bool(true), Bool(false))
// datapath
val data_in = cmd.bits.rs1
val wdata = data_in
val rdata = Mux(doPop, Stack_Lo.io.dataOut, Scratchpad_Loc(addr))
Stack_Lo.io.dataIn := Mux(doPush, data_in, UInt(0))
when (cmd.fire() && (doWrite)) {
Scratchpad_Loc(addr) := wdata
}
val doResp = cmd.bits.inst.xd
val stallReg = busy(addr)
//val stallLoad = doLoad && !io.mem.req.ready
val stallResp = doResp && !io.resp.ready && (doPush || doPop)
cmd.ready := !stallReg && !stallResp //removed stall load as we are not loading from memory
// command resolved if no stalls AND not issuing a load that will need a request
// PROC RESPONSE INTERFACE
io.resp.valid := cmd.valid && doResp && !stallReg //&& !stallLoad
// valid response if valid command, need a response, and no stalls
io.resp.bits.rd := cmd.bits.inst.rd
// Must respond with the appropriate tag or undefined behavior
io.resp.bits.data := rdata
// Semantics is to always send out prior accumulator register value
io.busy := cmd.valid || busy.reduce(_||_)
// Be busy when have pending memory requests or committed possibility of pending requests
io.interrupt := Bool(false)
// Set this true to trigger an interrupt on the processor (please refer to supervisor documentation)
}
class Stack_IO (implicit p: Parameters) extends CoreBundle
{
val dataIn = UInt(INPUT, 64)
val dataOut = UInt(OUTPUT, 64)
val push = Bool(INPUT)
val pop = Bool(INPUT)
val en = Bool(INPUT)
}
class Stack_Snd(depth: Int) (implicit p: Parameters) extends CoreModule {
val io = new Stack_IO;
// declare the memory for the stack
val stack_mem =
Mem(UInt(width = 64), depth)
val sp = Reg(init = UInt(0, width = log2Up(depth)))
val dataOut = Reg(init = UInt(0, width = 64))
// Push condition - make sure stack isn't full
when(io.en && io.push && (sp != UInt(depth-1))) {
stack_mem(sp) := io.dataIn
sp := sp + UInt(1)
}
// Pop condition - make sure the stack isn't empty
.elsewhen(io.en && io.pop && (sp > UInt(0))) {
dataOut := stack_mem(sp - UInt(1))
sp := sp - UInt(1)
}
io.dataOut := dataOut
}
我在前端服务器中执行的C代码如下。
// The following is a RISC-V program to test the functionality of the
// dummy RoCC accelerator.
// Compile with riscv64-unknown-elf-gcc dummy_rocc_test.c
// Run with spike --extension=dummy_rocc pk a.out
#include <assert.h>
#include <stdio.h>
#include <stdint.h>
int main() {
uint64_t x = 1, y = 456, z = 0, a=2, b=3, c=4, d=5, e=6;
asm volatile ("custom0 x0, %0, 0, 2" : : "r"(x));
asm volatile ("custom0 x0, %0, 1, 2" : : "r"(a));
asm volatile ("custom0 x0, %0, 2, 2" : : "r"(b));
asm volatile ("custom0 x0, %0, 3, 2" : : "r"(c));
asm volatile ("custom0 x0, %0, 4, 2" : : "r"(d));
asm volatile ("custom0 x0, %0, 5, 2" : : "r"(e));
asm volatile ("custom0 %0, x0, 0, 3" : "=r"(z));
printf("The popped value of z 0 is:- %d \n",z);
asm volatile ("custom0 %0, x0, 1, 3" : "=r"(z));
printf("The popped value of z 1 is:- %d \n",z);
asm volatile ("custom0 %0, x0, 2, 3" : "=r"(z));
printf("The popped value of z 2 is:- %d \n",z);
asm volatile ("custom0 %0, x0, 3, 3" : "=r"(z));
printf("The popped value of z 3 is:- %d \n",z);
asm volatile ("custom0 %0, x0, 4, 3" : "=r"(z));
printf("The popped value of z 4 is:- %d \n",z);
asm volatile ("custom0 %0, x0, 5, 3" : "=r"(z));
printf("The popped value of z 5 is:- %d \n",z);
printf("success!\n");
}
所以你可以看到我正在尝试从 1-6 推送一系列数字,但是当我 运行 zed 板上的这段代码时,这就是我得到的输出。
root@zynq:~# ./fesvr-zynq pk /sdcard/Custom\ elfs/rocc_fifo
The popped value of z 0 is:- 1
The popped value of z 1 is:- 0
The popped value of z 2 is:- 2
The popped value of z 3 is:- 2
The popped value of z 4 is:- 2
The popped value of z 5 is:- 2
理想情况下它应该弹出 6,5,4,3,2,1
我解决了这个问题,当堆栈的推送信号为高电平时,弹出信号必须保持低电平,反之亦然,当我们在弹出信号高电平的同时推送时,我们几乎同时弹出值,因此输出与上面日志中的一样。我花了很长时间调试,因此迟了重新发布