C getline内存泄漏不同的行为

C getline memory leak different behaviours

我对函数 getline() 有疑问,正如 valgrind 所报告的那样,它在两种内存使用情况下的行为似乎有所不同。我post 两种情况的代码和行为解释。 我希望有人能给我指出正确的方向。

第一个案例

getline() 在 while 循环中调用,读取缓冲区中文本文件的所有行。然后缓冲区仅在循环结束时释放一次:在这种情况下 valgrind 没有给出错误(没有发生泄漏)。

int main(int argc, char* argv[])
{
    char* buffer = NULL;
    size_t bufsize = 0;
    ssize_t nbytes;
    int counter = 0;
    char error = 0;

    FILE* input_fd = fopen(argv[1], "r");

    while ((nbytes = getline(&buffer, &bufsize, input_fd)) != -1)
    {
        counter += 1;
    }

    free(buffer);
    fclose(input_fd);

    return 0;
}

第二种情况

同一个循环调用一个函数,该函数又调用 getline(),传递相同的缓冲区。同样,缓冲区仅在循环结束时释放一次,但在这种情况下 valgrind 报告内存泄漏。事实上,制作程序 运行 并查看 RSS,我可以看到它随着循环的进行而增加。请注意,在循环内添加一个 free(每个循环都释放缓冲区)问题就会消失。这是代码。

int my_getline(FILE* lf_fd, char** lf_buffer)
{
    ssize_t lf_nbytes = 0;
    size_t lf_bufsiz = 0;
    lf_nbytes = getline(lf_buffer, &lf_bufsiz, lf_fd);
    if (lf_nbytes == -1)
        return 1;
    return 0;
}

int main(int argc, char* argv[])
{
    char* lf_buffer = NULL;
    size_t bufsize = 0;
    ssize_t nbytes;
    int counter = 0;
    int new_line_counter = 0;
    char error = 0;

    FILE* lf_fd = fopen(argv[1], "r");

    while ((my_getline(lf_fd, &lf_buffer)) == 0)
    {
        // Added to allow measuring the RSS
        sleep(2);
   
        // If I uncomment this, no memory leak occurs
        //free(lf_buffer);
    }

    free(lf_buffer);
    fclose(lf_fd);

    return 0;
}

Valgrind 输出

==9604== Memcheck, a memory error detector
==9604== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9604== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==9604== Command: ./my_getline_x86 /media/sf_Scambio/processes.log
==9604== HEAP SUMMARY:
==9604==     in use at exit: 1,194 bytes in 2 blocks
==9604==   total heap usage: 8 allocs, 6 frees, 11,242 bytes allocated
==9604== 
==9604== 1,194 bytes in 2 blocks are definitely lost in loss record 1 of 1
==9604==    at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-
linux.so)
==9604==    by 0x48E371D: getdelim (iogetdelim.c:102)
==9604==    by 0x1092B3: my_getline (my_getline.c:14)
==9604==    by 0x10956A: main (my_getline.c:38)
==9604== 
==9604== LEAK SUMMARY:
==9604==    definitely lost: 1,194 bytes in 2 blocks
==9604==    indirectly lost: 0 bytes in 0 blocks
==9604==      possibly lost: 0 bytes in 0 blocks
==9604==    still reachable: 0 bytes in 0 blocks
==9604==         suppressed: 0 bytes in 0 blocks
==9604== 
==9604== For lists of detected and suppressed errors, rerun with: -s
==9604== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

第一个程序没问题

第二个问题来自 getline() 的缓冲区长度参数。您的 my_getline() 总是将其设置为 0,这意味着 getline() 每次都分配一个新缓冲区(至少,对于您正在使用的 glibc 实现;见下文)。改成

int my_getline(FILE* lf_fd, char** lf_buffer, size_t* lf_bufsiz)
{
    ssize_t lf_nbytes = 0;
    lf_nbytes = getline(lf_buffer, lf_bufsiz, lf_fd);
    if (lf_nbytes == -1)
        return 1;
    return 0;
}

并在使用时传递一个指向最初初始化为0的size_t变量的指针。 main() 中的现有 bufsize 变量看起来适合使用:

//...
while ((my_getline(lf_fd, &lf_buffer, &bufsize)) == 0)
// ...

虽然解决起来很容易,但您遇到的内存泄漏似乎是 getline() 的 glibc 实现中的错误。

来自POSIX documentation

If *lineptr is a null pointer or if the object pointed to by *lineptr is of insufficient size, an object shall be allocated as if by malloc() or the object shall be reallocated as if by realloc(), respectively, such that the object is large enough to hold the characters to be written to it...

glibc manpage:

Alternatively, before calling getline(), *lineptr can contain a pointer to a malloc(3)-allocated buffer *n bytes in size. If the buffer is not large enough to hold the line, getline() resizes it with realloc(3), updating *lineptr and *n as necessary.

这些表明,在您 运行 的情况下,您将一个有效的非 NULL 指针传递给内存并说它的长度为 0,该函数应该使用realloc() 调整大小。但是,glibc implementation checks *lineptr == NULL || *n == 0 and if true, overwrites *lineptr with a newly allocated buffer, causing the leak you saw. Compare the NetBSD implementation 对所有分配使用 realloc()realloc(NULL, x) 等同于 malloc(x)),因此不会导致原始代码泄漏。这并不理想,因为它会在每次使用时导致 realloc() 而不是仅在缓冲区不够大以容纳当前行时(与上面的固定版本不同),但它有效。