在 C 中将字符串转换为双精度数

Question

我即将实现一个动态矩阵结构（存储双精度值），但我在读取文件时遇到了一些问题。

这个想法是，程序事先不知道行数和列数。它必须扫描第一行才能找到列数。

简单地使用 fscanf() 扫描双打的问题是，（据我所知）它无法区分换行符和 space 字符，所以它会读取整个文件为一行。

为了解决这个问题，我首先 fscanf() 一行一个字符地使用一个函数。它将值存储在一个字符串中，正好代表一行。

然后我使用 sscanf() 扫描字符串以查找双精度值并将它们存储在双精度数组中。转换后我释放了字符串。这是在 chararray_to_doublearray 函数中完成的。

经过一些测试后，我怀疑 chararray_to_doublearray 函数没有按预期工作。

/* Converts a character array to a double array and returns a pointer to it. Frees the space of the character array, as it's no longer needed. */
double *chararray_to_doublearray(char **chararray)
{
    int i;
    int elements = 0;
    double *numbers=NULL;
    double newnumber;
    while (sscanf(*chararray, "%lf ", &newnumber) == 1) {
        double* newarray = (double*) malloc(sizeof(double) * (elements+1));
        for (i = 0; i < elements; ++i)
            newarray[i] = numbers[i];
        free(numbers);
        numbers = newarray;
        numbers[elements] = newnumber;
        ++elements;
    }
    free(*chararray);
    return numbers;
}

并且 main() 函数仅调用 chararray_to_doublearray 函数：

main ()
{
    int i;
    double *numbers;
    char string[50]="12.3 1.2 3.4 4 0.3";
    numbers=chararray_to_doublearray(&string);
    free(numbers)
    return 0;
}

所以总结一下：直到行尾我都找不到从用户（或文件）读取双精度值的任何好的实现。这是我的实现。你有什么想法，这可能有什么问题吗？

此致，

naroslife

Answer 1

这是一个XY problem。您真的需要“fscanf() 一行一个字符”吗？这是否导致您在错误的方向上问了太多问题？

考虑一下：%lf表示将字符转换为您选择的double...当没有更合适的字符要转换时它会立即停止...并且换行符是不是适合转换的角色...你脑子里有没有亮着灯泡？

在您的情况下，格式字符串中 %lf 之后的 space 会导致有用信息（无论 white-space 是否为换行符）被丢弃。停止！你走得太远了，结果是你现在需要一个中间字符数组转换函数，这是不必要的膨胀。

有了这个新发现，即从格式字符串中删除白色-space 将导致 post-固定换行符留在流中，考虑使用 fgetc处理常规 white-space 和换行符之间的区别。

例如

double f;
int x = scanf("%lf", &f);
int c;
do {
    c = getchar();
} while (isspace(c) && c != '\n');
if (c != '\n') {
    ungetc(c, stdin);
}

见上文，我是如何区分换行符和非换行符的 white-space?

Answer 2

从文件或 stdin 中读取未知数量的双精度值并将它们存储在模拟二维数组中并不困难。 (pointer-to-pointer-to-type) 由于您必须假设每行的列数也可能不同，因此您需要类似的方式来分配列存储，跟踪值的数量 allocated/read 以及重新分配列存储的方法 if/when 已达到最大列数。这允许处理 锯齿状 数组就像处理具有固定大小的列的数组一样容易。

有一个巧妙的技巧可以极大地帮助管理锯齿状数组。由于您事先不知道可能存在多少列值——读取后，您需要一种方法来存储存在的列元素数（对于数组中的每一行）。一种简单而可靠的方法是将每行的列元素数存储为第一列值。然后在收集数据后，您将信息作为数组的一部分，它提供了一个键来遍历数组中的所有行和列。

作为此方法的一部分，我创建了特殊函数 xstrtod、xcalloc、xrealloc_sp（单指针数组的重新分配）和 realloc_dp（重新分配用于双指针）。这些只不过是将适当的错误检查移至函数的标准函数，因此无数的验证检查不会混淆代码的主体。

从 stdin 读取值的快速实现可以编码如下：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <errno.h>
#include <math.h>   /* for HUGE_VALF, HUGE_VALL */

#define ROWS 32
#define COLS 32
#define MAXC 256

double xstrtod (char *str, char **ep);
void *xcalloc (size_t n, size_t s);
void *xrealloc_sp (void *p, size_t sz, size_t *n);
void *xrealloc_dp (void **p, size_t *n);

int main (void) {

    char line[MAXC] = {0};              /* line buffer for fgets    */
    char *p, *ep;                       /* pointers for strtod      */
    double **array = NULL;              /* array of values          */
    size_t row = 0, col = 0, nrows = 0; /* indexes, number of rows  */
    size_t rmax = ROWS, cmax = COLS;    /* row/col allocation size  */

    /* allocate ROWS number of pointers to array of double */
    array = xcalloc (ROWS, sizeof *array);

    /* read each line in file */
    while (fgets(line, MAXC, stdin))
    {
        p = ep = line;  /* initize pointer/end pointer      */
        col = 1;        /* start col at 1, store ncols in 0 */
        cmax = COLS;    /* reset cmax for each row          */

        /* allocate COLS number of double for each row */
        array[row] = xcalloc (COLS, sizeof **array);

        /* convert each string of digits to number */
        while (errno == 0)
        {
            array[row][col++] = xstrtod (p, &ep);

            if (col == cmax) /* if cmax reached, realloc array[row] */
                array[row] = xrealloc_sp (array[row], sizeof *array[row], &cmax);

            /* skip delimiters/move pointer to next digit */
            while (*ep && *ep != '-' && (*ep < '0' || *ep > '9')) ep++;
            if (*ep)
                p = ep;
            else  /* break if end of string */
                break;
        }
        array[row++][0] = col; /* store ncols in array[row][0] */

        /* realloc rows if needed */
        if (row == rmax) array = xrealloc_dp ((void **)array, &rmax);
    }
    nrows = row;  /* set nrows to final number of rows */

    printf ("\n the simulated 2D array elements are:\n\n");
    for (row = 0; row < nrows; row++) {
        for (col = 1; col < (size_t)array[row][0]; col++)
            printf ("  %8.2lf", array[row][col]);
        putchar ('\n');
    }
    putchar ('\n');

    /* free all allocated memory */
    for (row = 0; row < nrows; row++)
        free (array[row]);
    free (array);

    return 0;
}

/** string to double with error checking.
 *  #include <math.h> for HUGE_VALF, HUGE_VALL
 */
double xstrtod (char *str, char **ep)
{
    errno = 0;

    double val = strtod (str, ep);

    /* Check for various possible errors */
    if ((errno == ERANGE && (val == HUGE_VAL || val == HUGE_VALL)) ||
        (errno != 0 && val == 0)) {
        perror ("strtod");
        exit (EXIT_FAILURE);
    }

    if (*ep == str) {
        fprintf (stderr, "No digits were found\n");
        exit (EXIT_FAILURE);
    }

    return val;
}

/** xcalloc allocates memory using calloc and validates the return.
 *  xcalloc allocates memory and reports an error if the value is
 *  null, returning a memory address only if the value is nonzero
 *  freeing the caller of validating within the body of code.
 */
void *xcalloc (size_t n, size_t s)
{
    register void *memptr = calloc (n, s);
    if (memptr == 0)
    {
        fprintf (stderr, "%s() error: virtual memory exhausted.\n", __func__);
        exit (EXIT_FAILURE);
    }

    return memptr;
}

/** reallocate array of type size 'sz', to 2 * 'n'.
 *  accepts any pointer p, with current allocation 'n',
 *  with the type size 'sz' and reallocates memory to
 *  2 * 'n', updating the value of 'n' and returning a
 *  pointer to the newly allocated block of memory on
 *  success, exits otherwise. all new memory is
 *  initialized to '0' with memset.
 */
void *xrealloc_sp (void *p, size_t sz, size_t *n)
{
    void *tmp = realloc (p, 2 * *n * sz);
#ifdef DEBUG
    printf ("\n  reallocating '%zu' to '%zu', size '%zu'\n", *n, *n * 2, sz);
#endif
    if (!tmp) {
        fprintf (stderr, "%s() error: virtual memory exhausted.\n", __func__);
        exit (EXIT_FAILURE);
    }
    p = tmp;
    memset (p + *n * sz, 0, *n * sz); /* zero new memory */
    *n *= 2;

    return p;
}

/** reallocate memory for array of pointers to 2 * 'n'.
 *  accepts any pointer 'p', with current allocation of,
 *  'n' pointers and reallocates to 2 * 'n' pointers
 *  intializing the new pointers to NULL and returning
 *  a pointer to the newly allocated block of memory on
 *  success, exits otherwise.
 */
void *xrealloc_dp (void **p, size_t *n)
{
    void *tmp = realloc (p, 2 * *n * sizeof tmp);
#ifdef DEBUG
    printf ("\n  reallocating %zu to %zu\n", *n, *n * 2);
#endif
    if (!tmp) {
        fprintf (stderr, "%s() error: virtual memory exhausted.\n", __func__);
        exit (EXIT_FAILURE);
    }
    p = tmp;
    memset (p + *n, 0, *n * sizeof tmp); /* set new pointers NULL */
    *n *= 2;

    return p;
}

编译

gcc -Wall -Wextra -Ofast -o bin/fgets_strtod_dyn fgets_strtod_dyn.c

输入

$ cat dat/float_4col.txt
 2078.62        5.69982       -0.17815       -0.04732
 5234.95        8.40361        0.04028        0.10852
 2143.66        5.35245        0.10747       -0.11584
 7216.99        2.93732       -0.18327       -0.20545
 1687.24        3.37211        0.14195       -0.14865
 2065.23        34.0188         0.1828        0.21199
 2664.57        2.91035        0.19513        0.35112
 7815.15        9.48227       -0.11522        0.19523
 5166.16        5.12382       -0.29997       -0.40592
 6777.11        5.53529       -0.37287       -0.43299
 4596.48        1.51918       -0.33986        0.09597
 6720.56        15.4161       -0.00158        -0.0433
 2652.65        5.51849        0.41896       -0.61039

输出

$ ./bin/fgets_strtod_dyn <dat/float_4col.txt

 the simulated 2D array elements are:

   2078.62      5.70     -0.18     -0.05
   5234.95      8.40      0.04      0.11
   2143.66      5.35      0.11     -0.12
   7216.99      2.94     -0.18     -0.21
   1687.24      3.37      0.14     -0.15
   2065.23     34.02      0.18      0.21
   2664.57      2.91      0.20      0.35
   7815.15      9.48     -0.12      0.20
   5166.16      5.12     -0.30     -0.41
   6777.11      5.54     -0.37     -0.43
   4596.48      1.52     -0.34      0.10
   6720.56     15.42     -0.00     -0.04
   2652.65      5.52      0.42     -0.61

内存检查

在您编写的任何动态分配内存的代码中，您必须使用内存错误检查程序来确保您没有写入 beyond/outside 您分配的内存块并确认您已释放您分配的所有内存。对于 Linux valgrind 是正常的选择。滥用内存块的微妙方法有很多，可能会导致真正的问题，没有理由不这样做。每个平台都有类似的内存检查器。它们都易于使用。只是运行你的程序通过它。

$ valgrind ./bin/fgets_strtod_dyn <dat/float_4col.txt
==28022== Memcheck, a memory error detector
==28022== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==28022== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==28022== Command: ./bin/fgets_strtod_dyn
==28022==

 the simulated 2D array elements are:

   2078.62      5.70     -0.18     -0.05
   5234.95      8.40      0.04      0.11
   2143.66      5.35      0.11     -0.12
   7216.99      2.94     -0.18     -0.21
   1687.24      3.37      0.14     -0.15
   2065.23     34.02      0.18      0.21
   2664.57      2.91      0.20      0.35
   7815.15      9.48     -0.12      0.20
   5166.16      5.12     -0.30     -0.41
   6777.11      5.54     -0.37     -0.43
   4596.48      1.52     -0.34      0.10
   6720.56     15.42     -0.00     -0.04
   2652.65      5.52      0.42     -0.61

==28022==
==28022== HEAP SUMMARY:
==28022==     in use at exit: 0 bytes in 0 blocks
==28022==   total heap usage: 14 allocs, 14 frees, 3,584 bytes allocated
==28022==
==28022== All heap blocks were freed -- no leaks are possible
==28022==
==28022== For counts of detected and suppressed errors, rerun with: -v
==28022== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

从 C 文件中读取未知数量的 rows 和未知数量的 columns 没有什么困难，但是你必须特别注意你是如何做的。虽然您可以将数组限制为正方形 (NxN) 数组，但没有理由每行不能有不同数量的列（锯齿状数组）。

您的基本方法是为数组或 指针分配内存以键入 double 以获得一些合理的预期行数。 ( #define ROWS 32 ) 然后您将阅读每一行。对于您阅读的每一行，您都会为 一个 'double' 的数组分配一个内存块，用于一些合理预期的双精度数。 ( #define COLS 32 )

然后将遇到的每个数字串转换为双精度值并将数字存储在 array[row][col]。（ 我们实际上开始存储值在 col = 1 并保存 col = 0 以保存该行的最终列数）你跟踪你添加的数字数组，如果你的列数达到你分配的数量，那么你 realloc 数组来保存额外的双精度值。

您继续阅读行，直到阅读完所有行。如果您达到了原始行数限制，您只需 realloc 数组就像您对 cols 所做的那样。

您现在已经存储了所有数据，可以随心所欲地使用它。完成后，不要忘记 free 您分配的所有内存。如果您有任何问题，请告诉我。

Quick Brown Fox 分隔文件

还有一点额外的稳健性可以构建到代码中，基本上可以让您读取任何一行数据，无论其中包含多少 junk文件。行值是否以 逗号分隔、分号分隔、space 分隔并不重要，或被 敏捷的棕狐 隔开。借助一点解析帮助，您可以通过手动前进到下一个数字的开头来防止读取失败。上下文中的快速添加是：

    while (errno == 0)
    {
        /* skip any non-digit characters */
        while (*p && ((*p != '-' && (*p < '0' || *p > '9')) ||
            (*p == '-' && (*(p+1) < '0' || *(p+1) > '9')))) p++;
        if (!*p) break;

        array[row][col++] = xstrtod (p, &ep);
        ...

跳过非数字将使您能够毫无问题地阅读几乎任何具有任何类型分隔符的正常文件。例如，原来使用相同的数字，但现在数据文件中的格式如下：

$ cat dat/float_4colmess.txt
The, 2078.62 quick  5.69982 brown -0.17815 fox;  -0.04732 jumps
 5234.95 over   8.40361 the    0.04028 lazy   0.10852 dog
and the  2143.66  dish ran      5.35245 away   0.10747  with -0.11584
the spoon, 7216.99        2.93732       -0.18327       -0.20545
 1687.24        3.37211        0.14195       -0.14865
 2065.23        34.0188         0.1828        0.21199
 2664.57        2.91035        0.19513        0.35112
 7815.15        9.48227       -0.11522        0.19523
 5166.16        5.12382       -0.29997       -0.40592
 6777.11        5.53529       -0.37287       -0.43299
 4596.48        1.51918       -0.33986        0.09597
 6720.56        15.4161       -0.00158        -0.0433
 2652.65        5.51849        0.41896       -0.61039

即使使用这种疯狂的格式，代码也可以正确地将所有数值正确读取到数组中。

在 C 中将字符串转换为双精度数

Convert string to doubles in C

c

dynamic-programming

type-conversion

dynamic-memory-allocation