将字符串字面量的地址映射到字符串字面量，通过解析ELF C++程序

Question

字符串文字的地址在编译时确定。该地址和字符串文字可以在构建的 executable 程序中找到（ELF 格式）。比如下面的代码输出String Literal: 0x400674

printf("String Literal: %p\n", "Hello World");

并且objdump -s -j .rodata test1显示

.rodata 部分的内容：

400670 01000200 48656c6c 6f20576f 726c6400 ....Hello World.

.....

所以看起来我可以通过读取 executable 程序本身来获取 "Hello World" 的虚拟地址。

问题：如何通过读取ELF格式在字符串文字的地址和字符串本身之间建立一个table/map/dictionary？

我正在尝试编写一个独立的 python 脚本或 c++ 程序来读取 elf 程序并生成 table。如果 table 中有额外的映射（不是字符串文字）也没关系，只要 table 包含字符串文字的整个映射。

Answer 1

我不确定你的问题总有道理。详细信息是特定于实现的（特定于操作系统和编译器以及编译标志）。

首先，允许（但不是必需）在同一翻译单元中看到 "abcd" 和 "cd" 文字字符串的编译器共享它们的存储并使用 "abcd"+2 作为第二个。参见 this answer。

然后，在ELF files, strings are simply initialized read-only data (often in the .rodata or .text section of the text segment), and they could happen to be the same as some non-string constants. ELF files do not keep any typing information (except as debug DWARF编译时的信息用-g)。也就是说，下面

const uint8_t constable[] = { 0x65, 0x68, 0x6c, 0x6c, 0x6f, 0 };

与 "hello" 文字字符串具有完全相同的机器表示，但不是源字符串。更糟糕的是，机器代码的某些部分可能恰好看起来像字符串。

顺便说一句，您可以使用 strings(1) 命令，或者研究其源代码并根据您的需要进行调整。

另见 dladdr(3) and 。

记住两个不同processes have (by definition!) different address spaces in virtual memory. Read also about ASLR。字符串文字也可能出现在共享对象中（例如像 libc.so 这样的共享库），它们通常在不同的地址段中被 mmap 编辑（因此相同的文字字符串在不同的进程中会有不同的地址！）。

您可能 libelf or readelf(1) or bfd 有兴趣阅读 ELF 文件。

将字符串字面量的地址映射到字符串字面量，通过解析ELF C++程序

map the address of string literal to string literal, by parsing ELF C++ program

c++

elf