为什么在未释放 malloc() 内存后地址清理器不指示内存泄漏？

Question

（这段代码不是我写的，是我教授写的...）我正在看我的教授写的一些代码，除了一件事之外，这对我来说都很有意义。（因为我们运行没时间了，他没有费心释放任何内存），但是，他在地址清理器打开的情况下进行编译。但是当他运行代码时，没有显示地址消毒器错误警告？

我们运行 gcc 9.3 在 Ubuntu 机器上。当我注释掉 add_line 函数时，它会抛出泄漏，仅针对 crnt。我猜 lines 不会引发内存泄漏，因为它是在全局 space 中声明的？但是为什么 crnt 在调用 add_line 函数时不会引发内存泄漏？

（此外，这里是使用的编译标志。 -g -std=c99 -Wall -Wvla -fsanitize=address,undefined)

代码如下：

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>

#define DEBUG 1

#define BUFSIZE 8
#define LISTLEN 16

char **lines;
int line_count, line_array_size;

void add_line(char *p)
{
    if (DEBUG) printf("Adding |%s|\n", p);
    if (line_count == line_array_size) {
    line_array_size *= 2;
    lines = realloc(lines, line_array_size * sizeof(char *));
    // TODO: check whether lines is NULL
    }

    lines[line_count] = p;
    line_count++;
}

int main(int argc, char **argv)
{
    int fd, bytes;
    char buf[BUFSIZE];
    char *crnt;
    int len;
    int pos, start;

    // TODO: move array list management to separate functions
    lines = malloc(sizeof(char *) * LISTLEN);
    if (!lines) {
    printf("malloc failed\n");
    return EXIT_FAILURE;
    }

    line_array_size = LISTLEN;
    line_count = 0;

    if (argc > 1) {
    fd = open(argv[1], O_RDONLY);
    if (fd == -1) {
        perror(argv[1]);
        return EXIT_FAILURE;
    }
    } else {
    fd = 0;
    }

    crnt = NULL;
    len = 0;
    while ((bytes = read(fd, buf, BUFSIZE)) > 0) {
    // read buffer and break file into lines

    start = 0;
    for (pos = 0; pos < bytes; pos++) {
        if (buf[pos] == '\n') {
        if (crnt == NULL) {
            len = pos - start;
            crnt = malloc(len + 1);
            memcpy(crnt, &buf[start], len);
        } else {
            len += pos;
            crnt = realloc(crnt, len + 1);
            memcpy(&crnt[len - pos], buf, pos);
        }
        crnt[len] = '[=10=]';
        // add_line(crnt); <------------- When I uncomment this line, no address-sanitizer leak is detected. With this line commented, asan does throw a leak only for the crnt variable. Why is that?
        crnt = NULL;
        start = pos + 1;
        }
    }

    if (start < pos) {
        if (crnt == NULL) {
        len = pos - start;
        crnt = malloc(len + 1);
        memcpy(crnt, &buf[start], len);
        } else {
        int newlen = len + (pos - start);
        crnt = realloc(crnt, newlen + 1);
        memcpy(&crnt[len], &buf[start], pos - start);
        len = newlen;
        }
        crnt[len] = '[=10=]';  // technically unnecessary
    }
    }
    if (bytes == -1) {
    perror("read");
    return EXIT_FAILURE;
    }

    // if we reach here, we have read the entire file
    // sort and print the list
    

    return 0;
}

Answer 1

这里的问题是“内存泄漏”的定义。我本来想引用 LeakSanitizer 文档中的一个部分，其中提供了对该概念的清晰和精确的定义，这似乎是其操作的基础，但我找不到，所以你必须忍受一点我的投影。

一个动态分配的区域（即与 malloc 或朋友一起）的内存已经泄漏，但不可能 freed。换句话说，如果您的程序分配内存并在分配释放之前丢弃地址，则内存已泄漏。

这与您可能认为的定义略有不同。如果您的程序在没有释放它分配的每个内存块的情况下终止，您可能会认为内存已泄漏。这当然是一个可能的定义，我不会批评它（太多），但它实际上不是很精确。

程序什么时候终止的？它在 main() returns 时并没有真正终止，因为您可能仍然有 clean-up 函数注册到 atexit()，并且这些函数直到 才执行之后 main() returns。（或者当 exit() 被调用时，这实际上是一回事。）为了 free() 对象而精确地使用 atexit() 函数实际上是很常见的（尽管在我看来，这是毫无意义的）在 exit().

之前可能没有被释放

OK，当main returns时，你不能通过检查它是否已经freed来检查内存分配是否已经被释放。如果你想那样做，你需要将测试推迟到最后可能的时刻。但在真正可能的最后时刻，进程将不复存在，操作系统将回收进程使用的所有内存，包括内存分配库获取的所有资源。所以在最后一刻，没有内存泄漏，因为没有内存。

（有些嵌入式系统没有独立进程内存等概念，所以我在那里写的内容可能不适用于 所有可能的计算系统。但它适用于实施 AddressSanitizer 的所有内容。）

一个关键点是 atexit() 处理程序需要能够找到它正在清理的对象，并且由于它在 main() 终止后执行，它不能使用任何自动（即stack-allocated) 对象。它只能使用具有静态生命周期的对象。因此，为了能够完成它的任务，终止时要清理的对象的地址必须存储在全局内存中。如果该区域的内存没有持久存储在某处，则内存已经泄漏（根据我上面的定义），我们实际上不必等待 atexit 是否设法 free 内存。

这让我们回到我声称的内存泄漏的可行定义：动态分配的内存，其地址不再存在于可执行文件中。该内存区域不能再使用，所以它是垃圾，但它不能 freed 因为程序不知道它的地址是什么。

您的 lines 数组是一个全局变量。确实，您在问题中指出了这一点：

I guess lines does not throw a memory leak because it was declared in the global space?

没错。 lines 是一个全局变量，所以它的内容即使在 main() returns 之后仍然可以访问。不仅它的内容是可访问的，它指向的数组中的某个对象指向的任何内存也是如此。如果你愿意，你可以 free 在 atexit 处理程序中保存的行：

void cleanup(void) {
  for (int i = 0; i < line_count; ++i) { free(lines[i]); }
  free(lines);
}

（要使用它，您只需在初始化 lines 和 line_count 之后调用 atexit(cleanup)。）

所以这将我们带到：

But why doesn't crnt throw a memory leak when the add_line function is called?

crnt 包含包含当前行的 dynamically-allocated 缓冲区的地址。如果您调用 add_line(crnt)，该指针将存储在 lines 中。所以它可用于 clean-up 函数，如上所述。您可以在方便时将 crnt 设置为 NULL，因为它不再是指向该缓冲区的唯一指针。

但是如果你不调用 add_line，那么 crnt 是唯一指向那个缓冲区的指针，当你设置 crnt到 NULL，不再有指向缓冲区的指针。缓冲区已泄漏，AddressSanitizer 会告诉您这件事。（即使您没有将 crnt 设置为 NULL，AddressSanitizer 也会发现问题，因为 crnt 在 main() returns 或调用 exit()，此时地址已经丢失。或者如果你用不同的分配地址覆盖 crnt。）

对于一个更简单的例子，试试这两个非常相似的程序：

内存泄漏

#include <stdlib.h>
int main(void) {
  void* megabyte = malloc(1<<20);
  (void)megabyte; /* Suppress unused variable warning */
}

没有内存泄漏

#include <stdlib.h>
void* megabyte;
int main(void) {
  megabyte = malloc(1<<20);
}

请注意，Valgrind 内存检查工具可以报告内存，如第二个示例中的 megabyte，它永远不会 freed，即使它是在 Valgrind 认为执行结束时仍然可以访问。但默认情况下它不会这样做。如果你运行 Valgrind 在第二个 prgram 带有标志 --show-leak-kinds=all --leak-check=full，它将报告 1 兆字节的内存“仍然可以访问”。（要尝试 valgrind，我相信你必须在没有 AddressSanitizer 的情况下编译程序。这两个工具并不完全兼容。）

为什么在未释放 malloc() 内存后地址清理器不指示内存泄漏？

Why is address sanitizer not indicating a memory leak after malloc() memory was not freed?

c

memory-leaks

address-sanitizer

内存泄漏

没有内存泄漏