如何知道编译在一起生成可执行文件的所有文件?

How to know about all the files which are compiled together to make an executable?

我们正在寻找一个程序,通过它我们可以轻松地列出所有编译在一起以生成可执行文件的文件。

用例:假设,我们有一个很大的存储库,我们想知道存储库中存在哪些文件,这些文件被编译为可执行文件(即 a.out)

例如:

dwarfdump a.out | grep "NS uri"
0x0000064a  [   9, 0] NS uri: "/home/main.c"
0x000006dd  [   2, 0] NS uri: "/home/zzzz.c"
0x000006f1  [   2, 0] NS uri: "/home/yyyy.c"
0x00000705  [   2, 0] NS uri: "/home/xxxx.c"
0x00000719  [   2, 0] NS uri: "/home/wwww.c"

但它没有列出所有的头文件。 请提出建议。

How to Extract Source Code From Executable with Debug Symbol Available ?

你不能那样做。我猜你在 Linux/x86-64(你的问题是 operating system and ABI specific, and debugging format specific). Of course, you should pass -g (or even -g3) to all the gcc compilation commands for your executable. Without that -g or -g3 option used to compile every translation unit(可能包括共享库的问题!)你可能没有足够的信息。

即使 debug information in DWARF format, the ELF 可执行文件也不 包含 源代码,但仅 引用 源代码(例如源文件路径,位置作为行号和列号)。因此,调试信息包含诸如文件 src/foo.c、第 34 行第 5 列之类的内容(但不要提供有关该位置附近 src/foo.ccontent 的任何信息)。当然,一旦 gdb 知道文件路径 src/foo.c 它就能够读取该源文件(如果可用并且是最新的 w.r.t. 可执行文件),因此它可以列出它。

提取调试元数据是一个不同的问题。一旦您了解了 DWARF,您就可以在 Python 或 Guile 中使用 objdump or readelf or addr2line or dwarfdump or libdwarf; and you could also script gdb (recent versions of GDB may be extendable 之类的工具)并将其用于您的 ELF 可执行文件。

也许您应该考虑 Ian Taylor 的 libbacktrace。它使用 DWARF 信息在运行时提供漂亮的回溯。

顺便说一句,cgdb(和ddd一样)只是前端gdb which does all the real work of processing that DWARF information. It is free software,你可以研究一下它的源代码。

i have only a.out then i want to list done file names

您可以尝试 dwarfdump -i | grep DW_AT_decl_file,您可以使用一些 GNU awk command instead of grep. You need to dive into the details of DWARF specifications and you need to understand more about the elf(5) 格式。

It doesn't listed down the all the header files

这是预料之中的。大多数头文件不包含任何代码,仅包含声明(例如printf实现<stdio.h> 中,但在 C standard library, e.g. in tree/src/stdio/printf.c if you use musl-libc; it is just declared in /usr/include/stdio.h). DWARF (and other debug information formats) are describing the binary code 的某些 C 源文件中。并且包含一些头文件只是为了访问一些预处理器宏(在预处理时扩展或跳过)。

也许你梦想homoiconic programming languages, then try Common Lisp (e.g. with SBCL).

如果您的问题是如何使用 gdb,请阅读 Debugging with GDB 手册。

如果您的问题是关于 decompilers, be aware that it is an impossible task in general (e.g. because of Rice's theorem). BTW, programs inside most Linux distributions are generally free software,那么很容易获得源代码(您甚至可以避免在 Linux 上使用专有软件)。

顺便说一句,你还可以在编译时通过传递more flags to gcc. You might pass -H or -M (etc...) to gcc (in addition of -g). You could even consider writing your own GCC plugin to collect the information you want in some database (but that is probably not worth the effort). You could also consider improving your build automation (e.g. adding more into your Makefile) to collect such information. BTW, many large C programs use some metaprogramming techniques by having some .c files perhaps containing #line directives generated by tools (e.g. bison)或脚本做更多的事情,那么你想保留什么样的文件路径?

We are looking for an procedure through which we can easily list down all the files which are compiled together to make an executable.

如果您正在编写该可执行文件并从其源代码对其进行编译,我建议您在构建时收集该信息。它可能就像将一些 -M and/or -H 标志传递给 gcc 一样微不足道,也许传递给一些 generated timestamp.c文件(参见 this for inspiration; but your timestamp.c might contain information provided by gcc -M etc...). Your timestamp file might contain git version control metadata (like generated in this Makefile). Read also about reproducible builds and about package managers.