运行 从 gem5 检查点恢复后的 valgrind

Running valgrind after restoring from a gem5 checkpoint

我们正在用gem5做一个项目,怀疑我们实现的一个新的内存对象存在内存泄漏。通常,这很容易...使用 valgrind --leak-check=full 启动已编译的 gem5.debug 二进制文件。不幸的是,在内存模式从 Atomic 切换到 Timing 之前(即,在使用不同的 CPU 模型从检查点快速转发和恢复之后),此内存对象不会做任何重要的事情。

当我们运行命令时: valgrind --leak-check=full --log-file=valgrind-out.txt --track-orgins=yes build/<ISA>/gem5.debug -d /path/to/outdir /path/to/python/config.py --checkpoint-restore=1 --other-options...

我们得到以下输出(在创建了许多 gem5 对象之后发生):

build/ARM/base/statistics.hh:277: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.
gem5 Simulator System.  http://gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 version 21.1.0.0
gem5 compiled Mar 15 2022 15:27:16
gem5 started Mar 15 2022 17:10:53
gem5 executing on <host name>, pid <pid>
command line: build/ARM/gem5.debug -d /path/to/outdir /path/to/python/config.py --other-options... --checkpoint-restore=1 --checkpoint-dir /path/to/checkpoint

warn: iobus.slave is deprecated. `slave` is now called `cpu_side_ports`
warn: bridge.master is deprecated. `master` is now called `mem_side_port`
warn: membus.master is deprecated. `master` is now called `mem_side_ports`
warn: bridge.slave is deprecated. `slave` is now called `cpu_side_port`
warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
warn: iobus.master is deprecated. `master` is now called `mem_side_ports`
warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
warn: membus.slave is deprecated. `slave` is now called `cpu_side_ports`
Global frequency set at 1000000000000 ticks per second
warn: No dot file generated. Please install pydot to generate the dot file and pdf.
debug.sh: line 13: 169858 Bus error               valgrind --leak-check=full --log-file=valgrind-out.txt --track-origins=yes build/ARM/gem5.debug -d /path/to/outdir /path/to/python/config.py --other-options... --checkpoint-restore=1 --checkpoint-dir /path/to/checkpoint

我们仍然发现这个程序可以在没有 valgrind 的情况下启动,所以我们有充分的理由相信 valgrind 是问题的根源。我们还知道,从头开始模拟时(无检查点),valgrind 与 gem5 一起工作。

所以,我们的问题是,当从检查点恢复 gem5 程序时,是否有办法利用 valgrind,或者两者是否相互矛盾?

来自 How does valgrind work?,valgrind pre-processes 并在 运行 之前修改应用程序。这意味着 valgrind 在某些位置需要某些 pre-allocated 指针。此外,在查看 valgrind 输出时,您可能会看到以下内容:Warning: set address range perms: large range [start, end) (undefined)。大型 gem5 对象很可能覆盖了 valgrind 元数据以破坏现有指针,这将导致总线错误。

为了将来的参考,我们成功地使用了 LibLeak,它有非常清晰的文档,易于与 gem5 一起使用,并且 运行 时间开销非常低。也帮我们成功找到了内存泄漏:-)