动态分配用户输入的字符串

Question

我正在尝试编写一个函数来执行以下操作：

开始输入循环，每次迭代打印 '> '。
获取用户输入的任何内容（长度未知）并将其读入字符数组，必要时动态分配数组的大小。用户输入的行将以换行符结束。
向字符数组末尾添加一个空字节 '[=14=]'。
当用户输入空行时循环终止：'\n'

这是我目前写的：

void input_loop(){
    char *str = NULL;

    printf("> ");

    while(printf("> ") && scanf("%a[^\n]%*c",&input) == 1){

        /*Add null byte to the end of str*/

        /*Do stuff to input, including traversing until the null byte is reached*/

        free(str);
        str = NULL;
    }
    free(str);
    str = NULL;
}

现在，我不太确定如何将空字节添加到字符串的末尾。我在想这样的事情：

last_index = strlen(str);
str[last_index] = '[=11=]';

但我不太确定这是否可行。我无法测试它是否可行，因为我在尝试编译代码时遇到此错误：

warning: ISO C does not support the 'a' scanf flag [-Wformat=]

那么我该怎么做才能使我的代码正常工作？

编辑：将 scanf("%a[^\n]%*c",&input) == 1 更改为 scanf("%as[^\n]%*c",&input) == 1 给我同样的错误。

Answer 1

首先，scanf 格式字符串不使用正则表达式，所以我认为接近您想要的东西不会起作用。至于你得到的错误，according to my trusty manual，%a转换标志是针对浮点数的，但它只适用于C99（你的编译器可能配置为C90）

但是你有一个更大的问题。 scanf 期望您将一个先前分配的空缓冲区传递给它，以便它用读取的输入填充。它不会为您 malloc 字符串，因此您尝试将 str 初始化为 NULL 和相应的释放将不适用于 scanf。

您可以做的最简单的事情就是放弃 n 个任意长度的字符串。创建一个大的缓冲区并禁止比这更长的输入。

然后您可以使用 fgets 函数来填充您的缓冲区。要检查它是否设法读取整行，请检查您的字符串是否以“\n”结尾。

char str[256+1];
while(true){
    printf("> ");
    if(!fgets(str, sizeof str, stdin)){
        //error or end of file
        break;
    }

    size_t len = strlen(str);
    if(len + 1 == sizeof str){
        //user typed something too long
        exit(1);
    }

    printf("user typed %s", str);
}

另一种选择是您可以使用非标准库函数。例如，在 Linux 中有一个 getline 函数，它在幕后使用 malloc 读取整行输入。

Answer 2

没有错误检查，完成后不要忘记释放指针。如果您使用此代码阅读大量行，那么您应该承受它给您带来的所有痛苦。

#include <stdio.h>
#include <stdlib.h>

char *readInfiniteString() {
    int l = 256;
    char *buf = malloc(l);
    int p = 0;
    char ch;

    ch = getchar();
    while(ch != '\n') {
        buf[p++] = ch;
        if (p == l) {
            l += 256;
            buf = realloc(buf, l);
        }
        ch = getchar();
    }
    buf[p] = '[=10=]';

    return buf;
}

int main(int argc, char *argv[]) {
    printf("> ");
    char *buf = readInfiniteString();
    printf("%s\n", buf);
    free(buf);
}

Answer 3

如果您使用 POSIX 系统，例如 Linux，您应该可以访问 getline。它可以表现得像 fgets，但如果您以空指针和零长度开始，它将为您分配内存。

您可以像这样在循环中使用 in：

#include <stdlib.h>
#include <stdio.h>
#include <string.h>    // for strcmp

int main(void)
{
    char *line = NULL;
    size_t nline = 0;

    for (;;) {
        ptrdiff_t n;

        printf("> ");

        // read line, allocating as necessary
        n = getline(&line, &nline, stdin);
        if (n < 0) break;

        // remove trailing newline
        if (n && line[n - 1] == '\n') line[n - 1] = '[=10=]';

        // do stuff
        printf("'%s'\n", line);
        if (strcmp("quit", line) == 0) break;
    }

    free(line);
    printf("\nBye\n");

    return 0;
}

传递的指针和长度值必须一致，这样getline才能根据需要重新分配内存。（这意味着您不应该在循环中更改 nline 或指针 line。）如果该行适合，则在每次循环中使用相同的缓冲区，因此您必须 free 行字符串只有一次，当你读完后。

Answer 4

有人提到 scanf 可能不适合此目的。我也不建议使用 fgets。虽然它稍微合适一些，但有些问题似乎很难避免，至少一开始是这样。很少有 C 程序员在第一次没有完整阅读 the fgets manual 的情况下设法正确使用 fgets。大多数人设法完全忽略的部分是：

线太大时会发生什么，并且
当 EOF 或遇到错误时会发生什么。

The fgets() function shall read bytes from stream into the array pointed to by s, until n-1 bytes are read, or a is read and transferred to s, or an end-of-file condition is encountered. The string is then terminated with a null byte.

Upon successful completion, fgets() shall return s. If the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgets() shall return a null pointer. If a read error occurs, the error indicator for the stream shall be set, fgets() shall return a null pointer...

我觉得没必要过分强调检查return值的重要性，就不再赘述了。可以这么说，如果您的程序不检查 return 值，您的程序将不知道何时 EOF 或发生错误；您的程序可能会陷入无限循环。

当没有 '\n' 存在时，该行的剩余字节尚未被读取。因此，fgets 将始终在内部至少解析一次该行。当您引入额外的逻辑来检查 '\n' 时，您正在第二次解析数据。

这允许您 realloc 存储并再次调用 fgets 如果您想动态调整存储的大小，或者丢弃该行的其余部分（警告用户截断是一个很好的想法），也许使用类似 fscanf(file, "%*[^\n]");.

hugomg 提到在动态调整代码中使用乘法来避免二次运行时问题。沿着这条线，最好避免在每次迭代中一遍又一遍地解析相同的数据（从而引入进一步的二次运行时问题）。这可以通过在某处存储您已读取（和解析）的字节数来实现。例如：

char *get_dynamic_line(FILE *f) {
    size_t bytes_read = 0;
    char *bytes = NULL, *temp;
    do {
         size_t alloc_size = bytes_read * 2 + 1;
         temp = realloc(bytes, alloc_size);
         if (temp == NULL) {
             free(bytes);
             return NULL;
         }
         bytes = temp;
         temp = fgets(bytes + bytes_read, alloc_size - bytes_read, f); /* Parsing data the first time  */
         bytes_read += strcspn(bytes + bytes_read, "\n");              /* Parsing data the second time */
    } while (temp && bytes[bytes_read] != '\n');
    bytes[bytes_read] = '[=10=]';
    return bytes;
}

那些设法阅读手册并想出正确的东西（像这样）的人可能很快就会意识到 fgets 解决方案的复杂性至少是使用 [=34] 的相同解决方案的两倍=].我们可以通过使用fgetc来避免第二次解析数据，所以使用fgetc似乎是最合适的。唉，大多数 C 程序员在忽略 the fgetc manual.

时也设法错误地使用 fgetc

最重要的细节是认识到 fgetc return 是 int，而不是 char。它可能 return 通常是 256 个不同值之一，介于 0 和 UCHAR_MAX（含）之间。它 否则可能 return EOF，意思是 通常有 257 个不同的值 fgetc（或者因此，getchar) 可能 return。尝试将这些值存储到 char 或 unsigned char 会导致信息丢失，特别是错误模式。（当然，如果CHAR_BIT大于8，那么257这个典型值会改变，因此UCHAR_MAX大于255）

char *get_dynamic_line(FILE *f) {
    size_t bytes_read = 0;
    char *bytes = NULL;
    do {
         if ((bytes_read & (bytes_read + 1)) == 0) {
             void *temp = realloc(bytes, bytes_read * 2 + 1);
             if (temp == NULL) {
                 free(bytes);
                 return NULL;
             }
             bytes = temp;
         }

         int c = fgetc(f);
         bytes[bytes_read] = c >= 0 && c != '\n'
                             ? c
                             : '[=11=]';
    } while (bytes[bytes_read++]);
    return bytes;
}

动态分配用户输入的字符串

Dynamically allocate user inputted string

c

arrays

user-input

scanf

dynamic-memory-allocation