如何在运行时在 C 中访问解释器路径地址？

Question

通过使用 objdump 命令我发现内存中的地址 0x02a8 包含开始路径 /lib64/ld-linux-x86-64.so.2，并且该路径以 0x00 字节结束，由于符合 C 标准。

所以我尝试编写一个简单的 C 程序来打印这一行（我使用了 Denis Yurichev 的“RE for beginners”一书中的示例 - 第 24 页）：

#include <stdio.h>

int main(){
    printf(0x02a8);
    return 0;
}

但令我失望的是出现分段错误而不是预期的 /lib64/ld-linux-x86-64.so.2 输出。

我觉得在没有说明符或至少没有指针转换的情况下使用 printf 的这种“快速”调用很奇怪，所以我试图使代码更自然：

#include <stdio.h>

int main(){
    char *p = (char*)0x02a8;
    printf(p);
    printf("\n");
    return 0;
}

在运行之后我仍然遇到分段错误。

我不认为这是因为内存区域受限而发生的，因为在本书中，第一次尝试一切顺利。我不确定，也许还有其他内容在那本书中没有提到。

所以需要一些明确的解释来解释为什么每次我尝试运行该程序时都会出现分段错误。

我正在使用最新的完全升级的 Kali Linux 版本。

Answer 1

它不再像那样工作了。您可能使用的 64 位 Linux 可执行文件是 position-independent，它们被加载到内存中的任意地址。在这种情况下，ELF 文件不包含任何固定基址。

虽然你可以制作一个 position-dependent 可执行文件 as instructed by Marco Bonelli 但现代 64 位 linuxen 上的任意可执行文件并不是这样工作的，所以更值得学习使用 position- 独立的，但有点棘手。

这对我有用，可以打印 ELF 即精灵 header 魔法和解释器字符串。这很脏，因为它可能只适用于小型可执行文件。

#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>

int main(){
    // convert main to uintptr_t
    uintptr_t main_addr = (uintptr_t)main;

    // clear bottom 12 bits so that it points to the beginning of page
    main_addr &= ~0xFFFLLU;

    // subtract one page so that we're in the elf headers...
    main_addr -= 0x1000;

    // elf magic
    puts((char *)main_addr);

    // interpreter string, offset from hexdump!
    puts((char *)main_addr + 0x318);
}

还有一个技巧可以在内存中找到 ELF 可执行文件的开头：so-called auxiliary vector and getauxval:

The getauxval() function retrieves values from the auxiliary vector, a mechanism that the kernel's ELF binary loader uses to pass certain information to user space when a program is executed.

ELF 程序headers在内存中的位置将是

#include <sys/auxv.h>
char *program_headers = (char*)getauxval(AT_PHDR);

实际的 ELF header 是 64 字节长，程序 headers 从字节 64 开始，所以如果你从中减去 64 你会再次得到一个指向魔法字符串的指针，因此我们的代码可以简化为

#include <stdio.h>
#include <inttypes.h>
#include <sys/auxv.h>


int main(){
    char *elf_header = (char *)getauxval(AT_PHDR) - 0x40;
    puts(elf_header + 0x318); // or whatever the offset was in your executable
}

最后，一个仅从 ELF headers 计算出解释器位置的可执行文件，前提是你有一个 64 位 ELF，幻数来自 Wikipedia...

#include <stdio.h>
#include <inttypes.h>
#include <sys/auxv.h>


int main() {
    // get pointer to the first program header
    char *ph = (char *)getauxval(AT_PHDR);

    // elf header at this position
    char *elfh = ph - 0x40;

    // segment type 0x3 is the interpreter;
    // program header item length 0x38 in 64-bit executables
    while (*(uint32_t *)ph != 3) ph += 0x38;

    // the offset is 64 bits at 0x8 from the beginning of the 
    // executable
    uint64_t offset = *(uint64_t *)(ph + 0x8);

    // print the interpreter path...
    puts(elfh + offset);
}

Answer 2

看到你的《RE for beginners》这本书不先深入基础，吐出这些废话，真让人失望。尽管如此，你这样做显然是错误的，让我解释一下原因。

通常在 Linux，GCC 生成 position independent. This is done for security purposes. When the program is run, the operating system is able to place it anywhere in memory (at any address), and the program will work just fine. This technique is called Address Space Layout Randomization 的 ELF 可执行文件，并且是现在默认启用的操作系统的一项功能。

通常情况下，ELF 程序会有一个“基地址”，并且会准确地加载到该地址以便运行。但是，对于位置无关的 ELF，“基地址”设置为 0x0，操作系统和解释器决定在运行时将程序放在哪里。

在与位置无关的可执行文件上使用 objdump 时，您看到的每个地址都是 不是真实地址，但是相反，它是程序基址的偏移量（只有在运行时才知道）。因此不可能在运行时知道这样一个字符串（或任何其他变量）的位置。

如果你想让上面的工作起作用，你将不得不编译一个 not 位置独立的 ELF。您可以这样做：

gcc -no-pie -fno-pie prog.c -o prog

Answer 3

我猜它的段错误是因为您使用 printf 的方式：您没有按照设计的方式使用格式参数。

当您想使用 printf 函数读取数据时，它采用的第一个参数是一个字符串，它将格式化显示的工作方式 int printf(char *fmt , ...) "the ... 表示您要根据格式字符串参数显示的数据

所以如果你想打印一个字符串 //格式为文本

  printf("%s\n", pointer_to_beginning_of_string);

// 如果这不起作用，可能是因为您正在尝试读取您不应该访问的内存。

尝试使用您的编译器添加额外的标志“-Werror -Wextra -Wall -pedantic”并向我们展示错误。

如何在运行时在 C 中访问解释器路径地址？

How can I access inerpreter path address at runtime in C?

c

memory-management

objdump

memory-address