C：如何在运行时在我的程序中更改我自己的程序？

Question

在运行时，汇编程序或机器代码（是哪个？）应该位于 RAM 中的某个位置。我能否以某种方式访问它，读取甚至写入它？

这仅用于教育目的。

所以，我只能编译这段代码。我真的在这里读自己吗？

#include <stdio.h>
#include <sys/mman.h>

int main() {
    void *p = (void *)main;
    mprotect(p, 4098, PROT_READ | PROT_WRITE | PROT_EXEC);
    printf("Main: %p\n Content: %i", p, *(int *)(p+2));
    unsigned int size = 16;
    for (unsigned int i = 0; i < size; ++i) {
        printf("%i ", *((int *)(p+i)) );
    }
}

不过，如果我添加

*(int*)p =4;

那就是段错误。

根据答案，我可以构造以下代码，该代码会在运行时自行修改：

#include <stdio.h>
#include <sys/mman.h>
#include <errno.h>
#include <string.h>
#include <stdint.h>

void * alignptr(void * ptr, uintptr_t alignment) {
    return (void *)((uintptr_t)ptr & ~(alignment - 1));
}

// pattern is a 0-terminated string
char* find(char *string, unsigned int stringLen, char *pattern) {
    unsigned int iString = 0;
    unsigned int iPattern;
    for (unsigned int iString = 0; iString < stringLen; ++iString) {
        for (iPattern = 0;
            pattern[iPattern] != 0
            && string[iString+iPattern] == pattern[iPattern];
            ++iPattern);
        if (pattern[iPattern] == 0) { return string+iString; }
    }
    return NULL;
}

int main() {
    void *p = alignptr(main, 4096);
    int result = mprotect(p, 4096, PROT_READ | PROT_WRITE | PROT_EXEC);
    if (result == -1) {
        printf("Error: %s\n", strerror(errno));
    }

    // Correct a part of THIS program directly in RAM
    char programSubcode[12] = {'H','e','l','l','o',
                                ' ','W','o','r','l','t',0};
    char *programCode = (char *)main;
    char *helloWorlt = find(programCode, 1024, programSubcode);
    if (helloWorlt != NULL) {
        helloWorlt[10] = 'd';
    }   
    printf("Hello Worlt\n");
    return 0;
}

太棒了！谢谢大家！

Answer 1

在大多数操作系统上（Linux、Windows、Android、MacOSX 等），程序不执行（直接) 在 RAM 中，但有它的 virtual address space and runs in it (stricto sensu, the code is not -always or necessarily- in RAM; you can have code which is not in RAM and which gets executed, after some page fault bring it transparently in RAM). The RAM is (directly) managed by the OS, but your process only sees its virtual address space (initialized at execve(2) time and modified with mmap(2), munmap, mprotect, mlock(2)...). Use proc(5) 并在 Linux shell 中尝试 cat /proc/$$/maps 以了解更多 space 您的 shell 进程的虚拟地址。在 Linux 上，您可以通过读取 /proc/self/maps 文件（顺序地，它是一个文本伪文件）来查询进程的虚拟地址 space。

阅读 Operating Systems: Thee Easy Pieces 以了解有关 OSes 的更多信息。

实际上，如果你想在你的程序中增加代码（运行宁在一些常见的OS）你最好使用plugins and the dynamic loading facilities. On Linux and POSIX systems you'll use dlopen(3) (which uses mmap etc...) then with dlsym(3)你会得到（一些新函数的虚拟）地址，你可以调用它（通过将它存储在你的 C 代码的一些函数指针中）。

你并没有真正定义程序是什么。我声称一个程序不仅是一个可执行文件，而且还由其他资源（例如特定的库，可能是字体或配置文件等）组成，这就是为什么当你 install 一些程序，通常比可执行文件移动或复制的要多得多（查看 make install 对大多数自由软件程序所做的事情，甚至像 GNU coreutils 这样简单的程序）。因此，生成一些 C 代码（例如在某些临时文件 /tmp/genecode.c 中）的程序（在 Linux 上）将该 C 代码编译成插件 /tmp/geneplug.so（通过运行ning gcc -Wall -O -fPIC /tmp/genecode.c -o /tmp/geneplug.so)，然后 dlopen /tmp/geneplug.so 插件 真正地 自我修改。如果你只用 C 编写代码，那是编写自修改程序的明智方式。

通常，您的机器代码位于 Linux 上的 code segment, and that code segment is read-only (and sometimes even execute-only; read about the NX bit). If you really want to overwrite code (and not to extend it), you'll need to use facilities (perhaps mprotect(2) 中，以更改该权限并启用代码段内的重写。

一旦您的代码段的某些部分可写，您就可以覆盖它。

还考虑一些 JIT-compiling libraries, such as libgccjit or asmjit（和其他），以在内存中生成机器代码。

当您 execve 一个新的可执行文件时，它的大部分代码（还）不在 RAM 中。但是（从应用程序中的用户代码的角度来看）您可以运行它（并且内核将透明地但懒惰地通过 demand paging 将代码页放入 RAM 中）。这就是我试图通过说您的程序运行在其虚拟地址 space 中（不是直接在 RAM 中）来解释的内容。需要一整本书来进一步解释。

例如，如果您有一个 1 GB 的巨大可执行文件（为简单起见，假设它是静态链接的）。当您启动该可执行文件（使用 execve）时，整个千兆字节 而不是 进入 RAM。如果您的程序快速退出，那么大部分 GB 还没有进入 RAM 并留在磁盘上。即使您的程序运行s 很长一段时间，但从未调用过一百兆字节代码的巨大例程，该代码部分（从未使用过的例程的 100MB）也不会在 RAM 中。

顺便说一句，严格意义上来说，self modifying code 现在很少使用了（当前的处理器甚至不能有效地处理它，例如因为缓存和分支预测器）。所以在实践中，你不会完全修改你的机器代码（即使那是可能的）。

而malware不必修改当前执行的代码。它可以（并且经常）在内存中注入 new 代码并以某种方式跳转到它（更准确地说，通过一些函数指针调用它）。所以通常你不会覆盖现有的"actively used"代码，你在别处创建新代码然后调用它或跳转到它。

如果您想在 C 的其他地方创建新代码，插件工具（例如 Linux 上的 dlopen 和 dlsym）或 JIT 库就足够了。

请注意，您的问题中提到的 "changing your program" 或 "writing code" 非常含糊。

您可能只想扩展您的程序代码（然后使用插件技术或 JIT 编译库是相关的）。请注意，某些程序（例如 SBCL）能够在每次用户交互时生成机器代码。

您可以更改程序的现有代码，但随后您应该解释它的确切含义（"code" 对您意味着什么 exactly？它只是当前执行的机器指令还是你程序的整个代码段？）。您是否想到自修改代码、生成新代码、dynamic software updating？

Can I somehow get access to it, and read or even write to it?

当然可以。您需要为您的代码更改虚拟地址 space 中的保护（例如使用 mprotect），然后在某些 "old code" 部分写入许多字节。你为什么要这样做是另一回事（你没有解释为什么）。我看不出这样做有任何教育目的 - 你可能会很快使你的程序崩溃（除非你采取很多的预防措施在内存中编写足够好的机器代码）。

我是 metaprogramming but I generally generate some new code and jump into it. On our current machines, I see no value in overwriting existing code. And (on Linux), my manydl.c program demonstrates that you could generate C code, compile, and dynamically link more than a million plugins (and dlopen all of them) in a single program. In practice, on current laptop or desktop computers, you can generate a lot of new code (before being concerned by limits). And C is fast enough (both in compilation time and in run time) that you could generate a thousands of C lines at every user interaction (so several times per second), compile and dynamically load it (I did that ten years ago in my defunct GCC MELT 项目的超级粉丝。

如果要覆盖executable files on disk (I see no value in doing that, it is much simpler to create fresh executables), you need to understand deeply their structure. For Linux, dive into the specifications of ELF.

在编辑的问题中，您忘记测试 mprotect 的失败。它可能会失败（因为 4098 不是 2 的幂和页数）。所以请至少输入代码：

int c = mprotect(p, 4096, PROT_READ | PROT_WRITE | PROT_EXEC);
if (c) { perror("mprotect"); exit(EXIT_FAILURE); };

即使使用 4096（而不是 4098），mprotect 也可能因 EINVAL 而失败，因为 main 可能未与 4K 页面对齐。（不要忘记您的可执行文件还包含 crt0 代码）。

顺便说一句，出于教育目的，您应该在 main 的开头附近添加以下代码：

 char cmdbuf[80];
 snprintf (cmdbuf, sizeof(cmdbuf), "/bin/cat /proc/%d/maps", (int)getpid());
 fflush(NULL);
 if (system(cmdbuf)) 
   { fprintf(stderr, "failed to run %s\n", cmdbuf); exit(EXIT_FAILURE));

并且您可以在接近结尾处添加一个类似的代码块。您可以将 cmdbuf 的 snprintf 格式字符串替换为 "pmap %d".

Answer 2

机器代码已加载到内存中。从理论上讲，您可以像程序访问内存的任何其他部分一样读写它。

在实践中这样做可能会遇到一些障碍。现代操作系统尝试将内存的数据部分限制为 read/write 操作但不执行，并将内存的机器代码部分限制为 read/execute 但不写入。这是为了尝试限制潜在的安全漏洞，这些漏洞允许执行程序感觉像是放入内存中的任何内容（比如它可能从 Internet 上下载的随机内容）。

Linux 提供 mprotect system call to allow some amount of customization for memory protection. Windows provides the SetProcessDEPPolicy 系统调用。

编辑更新的问题

您似乎是在 Linux 上尝试此操作并使用 mprotect。您发布的代码未检查 mprotect 中的 return 值，因此您不知道调用是成功还是失败。这是检查 return 值的更新版本：

#include <stdio.h>
#include <sys/mman.h>
#include <errno.h>
#include <string.h>
#include <stdint.h>

void * alignptr(void * ptr, uintptr_t alignment)
{
    return (void *)((uintptr_t)ptr & ~(alignment - 1));
}

int main() {
    void *p = alignptr(main, 4096);
    int result = mprotect(p, 4096, PROT_READ | PROT_WRITE | PROT_EXEC);

    if (result == -1) {
        printf("Error: %s\n", strerror(errno));
    }
    printf("Main: %p\n Content: %i", main, *(int *)(main+2));
    unsigned int size = 16;
    for (unsigned int i = 0; i < size; ++i) {
        printf("%i ", *((int *)(main+i)) );
    }
}

注意传递给 mprotect 的长度参数的变化以及将指针与系统页面边界对齐的函数。您需要调查您的特定系统。我的系统对齐 4096 字节（由运行 getconf PAGE_SIZE 确定），在对齐指针并将长度参数更改为 mprotect 到页面大小后，这有效，并允许您重写你指向 main.

正如其他人所说，这是动态加载代码的糟糕方式。动态库或插件是首选方法。

Answer 3

原则上这是可能的，实际上您的操作系统会保护自己免受危险代码的侵害！

在计算机内存非常小的时代（1950 年代），自修改代码可能被视为 "neat-trick"。后来（当不再需要时）它被认为是不好的做法——导致代码难以维护和调试。

在更现代的系统中（20 世纪末），它成为病毒和恶意软件的行为指示。因此，所有现代桌面操作系统都不允许修改程序的代码 space，并且还会阻止执行注入数据 space 的代码。例如，带有 MMU 的现代系统可以将内存区域标记为只读且不可执行。

如何获取代码地址的更简单的问题space - 很简单。例如函数指针值一般是函数的地址：

int main()
{
    printf( "Address of main() = %p\n", (void*)main ) ;
}

另请注意，在现代系统中，此地址将是虚拟地址而不是物理地址。

Answer 4

完成此操作的最直接实用的方法是使用函数指针。您可以声明一个指针，例如：

void (*contextual_proc)(void) = default_proc;

然后用语法 contextual_proc(); 调用它。您还可以将具有相同签名的不同函数分配给 contextual_proc，比如 contextual_proc = proc_that_logs;，然后调用 contextual_proc() 的任何代码将（模线程安全）调用新代码。

这在效果上很像自修改代码，但它更容易理解、可移植，并且实际上可以在可执行内存不可写且指令被缓存的现代 CPU 上运行。

在 C++ 中，您可以为此使用子类；静态调度将在后台以相同的方式实现它。

C：如何在运行时在我的程序中更改我自己的程序？

C: How to change my own program in my program in runtime?

c

self-modifying

self-reference

编辑更新的问题