使用 strtok 将长字符串拆分为较短的字符串时出错

Error when using strtok to split a long string into shorter strings

我有一个函数,我试图在其中拆分一个字符串,但不知何故,当读取 spaces 时它停止了。

input.csv: 18820218,Northern Ireland,England,0,13,Friendly,Belfast,Ireland,FALSE

output.txt: 18820218,Northern,(null),(null),(null),(null),(null),(null),(null)

typedef struct
{
    long int date;
    char *h_team;
    char *a_team;
    int home_score;
    int away_score;
    char *reason;
    char *city;
    char *country;
    char *neutral_field;

}Data;


void open_output(char *string, FILE **output)
{       
    if((*output=fopen(string, "w")) == NULL)
    {
        printf("%s not found\n", string);
            exit(1);
    }
}

void alloc_Data(Data *d, int size)
{
    d->line1 = (char*)malloc(50*sizeof(char)); 
    d->h_team = (char*)malloc(30*sizeof(char)); 
    d->a_team = (char*)malloc(30*sizeof(char)); 
    d->reason = (char*)malloc(30*sizeof(char)); 
    d->city = (char*)malloc(30*sizeof(char)); 
    d->country = (char*)malloc(30*sizeof(char)); 
    d->neutral_field = (char*)malloc(9*sizeof(char)); 
}

void store(Data *d, FILE *output)
{
    char *string = "18820218,Northern Ireland,England,0,13,Friendly,"
                    "Belfast,Ireland,FALSE";
    char *char_date = malloc(10*sizeof(char));
    char *char_hscore = malloc(20*sizeof(char));
    char *char_ascore = malloc(3*sizeof(char));

    char *token;

    token = strtok(string, ",");    
    char_date = token;

    token = strtok(NULL, ",");
    d->h_team = token;  

    token = strtok(NULL, ",");
    d->a_team = token;  

    token = strtok(NULL, ",");
    char_hscore = token;

    token = strtok(NULL, ",");
    char_ascore = token;    

    token = strtok(NULL, ",");
    d->reason = token;  

    token = strtok(NULL, ",");
    d->city = token;    

    token = strtok(NULL, ",");
    d->country = token; 

    token = strtok(NULL, ",");
    d->neutral_field = token;   

    d->date = atoi(char_date);
    d->home_score = atoi(char_hscore);
    d->away_score = atoi(char_ascore);

    fprintf(output, "%li,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, 
            d->a_team, d->home_score, d->away_score, d->reason, d->city, 
            d->country, d->neutral_field );

    free(string);
    free(char_date);
    free(char_hscore);
    free(char_ascore);
}

int main(int argc, char *argv[])
{
    FILE *output;
    char *string = "saida.txt";

    open_output(string, &output);   

    Data *d;
    d = (Data*)malloc(sizeof(Data)); 
    alloc_Data(d);

    store(d, output);

    free(d);

    return 0;
}

所示代码将不会编译生成,原因如下:

  • 结构中不存在成员d->line1
  • 函数void alloc_Data(Data *d, int size)有两个参数, 但是调用:alloc_Data(d); 只有 1 个参数。

此外,由于未提供函数 open_output(string, &output); 的定义,因此任何试图提供帮助的人都无法 运行 代码。 (假设超出了这一点)

除此之外……

这个:

    token = strtok(NULL, ",");
    d->h_team = token;  

实际上是在更改先前分配的指针的地址,从而导致内存泄漏。 (这是因为对 free(d->h_team); 的任何后续调用都将针对从未分配过的地址位置进行)。

这个:

    token = strtok(NULL, ",");
    strcpy(d->h_team,token);

的结果是将驻留在token地址的内容分配给位于d->h_team的地址,这意味着您仍然可以在使用完后调用free(d->h_team);。 (避免内存泄漏)

要克服您所看到的失败,这可能会有所帮助:

    char *string = "18820218,Northern Ireland,England,0,13,Friendly,Belfast,Ireland,FALSE";
    char *workingbuf = '[=12=]'

    workingbuf  = strdup(string);
    token = strtok(string, ",");
    ...    

最后一个想法,在假设 token 包含任何内容之前检查 strtok() 的输出是个好主意:

    token = strtok(NULL, ",");
    if(token)
    {
        d->h_team = token;
        ...  

编辑
实施我上面建议的更改后,包括您添加 open_output,您的代码 运行.

Ana,我看到你的问题在过去的几次迭代中发生了变化,很明显你知道你需要把哪些部分放在一起,但是你在某种程度上让自己变得比需要的更难你正在尝试将它们组合在一起。

动态分配您的结构或数据的目的是 (1) 处理比您的程序堆栈适合的数据量更大的数据(这里不是问题),(2) 允许您增加或缩小数据量您正在使用的存储,因为您的数据需求在您的程序过程中会发生波动(这里也不是问题),或者 (3) 允许您根据程序中使用的数据定制您的存储需求。这最后一部分似乎是您正在尝试的,但是通过为您的字符数组分配固定大小——您完全失去了根据数据大小调整分配的好处。

为了给你的数据中包含的每个字符串分配存储空间,你需要得到每个字符串的长度,然后分配length + 1个字符进行存储(+1为nul-terminating 字符)。虽然您可以使用 malloc 然后 strcpy 完成分配并复制到新的内存块,但如果您有 strdup,则可以在一个函数调用中为您完成这两项工作。

您面临的困境是“在获取长度和分配副本之前我应该​​将数据存储在哪里?”您可以通过多种方式处理这个问题。您可以声明一堆不同的变量,然后将数据解析为单独的变量(有点混乱),您可以分配一个具有固定值的结构来初始存储值(一个不错的选择,但调用 malloc对于 3050 字符在固定数组时没有多大意义),或者您可以声明一个单独的具有固定数组大小的临时结构来使用(这样可以收集混乱将单独的变量组合成一个结构,然后可以轻松地将其传递给您的分配函数)考虑每一个,并使用最适合您的那个。

你的函数 return 类型并没有像现在这样有意义。您需要选择一个有意义的 return 类型,以允许函数指示它是成功还是失败,然后 return 一个值(或指向一个值的指针)为您的其余部分提供有用的信息程序。测量 success/failure 函数对于分配内存或处理输入或输出的函数尤为重要。

除了您选择的 return 类型之外,您还需要考虑传递给每个函数的参数。您需要考虑哪些变量需要在您的函数中可用。带上你的 FILE* 参数。您永远不会在 store() 函数之外使用该文件 - 那么为什么要在 main() 中声明它,这导致您不得不担心 return 通过指针打开流 -你不用的。

考虑到这一点,我们可以稍微将您的程序的各个部分组合在一起。

首先,不要在整个代码中使用 幻数。 (例如 9, 10, 20, 30, 50, etc..)相反,

#define MAXN  9     /* if you need constants, define one (or more) */
#define MAXC 30
#define MAXL 50

(或者您可以使用 enum 达到相同目的)

为了示例的目的,您可以使用动态分配的结构来高效存储数据,并使用临时结构来帮助解析数据行中的值。例如:

typedef struct {    /* struct to hold dynamically allocated data */
    long date;      /* sized to exact number of chars required. */
    int home_score,
        away_score;
    char *h_team,
        *a_team,
        *reason,
        *city,
        *country,
        *neutral_field;
} data_t;

typedef struct {    /* temp struct to parse data from line */
    long date;      /* sized to hold largest anticipated data */
    int home_score,
        away_score;
    char h_team[MAXC],
        a_team[MAXC],
        reason[MAXC],
        city[MAXC],
        country[MAXC],
        neutral_field[MAXN];
} data_tmp_t;

接下来,open_output 函数的全部目的是打开文件进行写入。它应该 return 成功打开文件流,否则为 NULL,例如

/* pass filename to open, returns open file stream pointer on
 * success, NULL otherwise.
 */
FILE *open_output (const char *string)
{       
    FILE *output = NULL;

    if ((output = fopen (string, "w")) == NULL)
        fprintf (stderr, "file open failed. '%s'.\n", string);

    return output;
}

您的 alloc_data 函数正在分配数据结构并填充其值。它应该 return 成功时指向完全分配和填充的结构的指针,或者失败时为 NULL,例如

/* pass temporary struct containing data, dynamic struct allocated,
 * each member allocated to hold exact number of chars (+ terminating
 * character). pointer to allocated struct returned on success,
 * NULL otherwise.
 */
data_t *alloc_data (data_tmp_t *tmp)
{
    data_t *d = malloc (sizeof *d); /* allocate structure */

    if (d == NULL)
        return NULL;

    d->date = tmp->date;

    /* allocate each string member with strdup. if not available,
     * simply use malloc (strlen(str) + 1), and then strcpy.
     */
    if ((d->h_team = strdup (tmp->h_team)) == NULL)
        return NULL;
    if ((d->a_team = strdup (tmp->a_team)) == NULL)
        return NULL;

    d->home_score = tmp->home_score;
    d->away_score = tmp->away_score;

    if ((d->reason = strdup (tmp->reason)) == NULL)
        return NULL;
    if ((d->city = strdup (tmp->city)) == NULL)
        return NULL;
    if ((d->country = strdup (tmp->country)) == NULL)
        return NULL;
    if ((d->neutral_field = strdup (tmp->neutral_field)) == NULL)
        return NULL;

    return d;   /* return pointer to allocated struct */
}

每当您分配嵌套在一个结构(或嵌套结构)中的多个值时,养成将 free_data 函数写入 free 您在 [= 中分配的内存的习惯34=]。与在代码周围分散 free 调用相比,编写一个自由函数来正确处理您分配的复杂结构要好得多。释放变量时没有return检查,所以你可以在这里使用void函数:

/* frees each allocated member of d, and then d itself */
void free_data (data_t *d)
{
    free (d->h_team);
    free (d->a_team);
    free (d->reason);
    free (d->city);
    free (d->country);
    free (d->neutral_field);

    free (d);
}

您的 store() 函数是大多数决策和验证检查发生的地方。您的代码的目的是解析 string,然后将其存储在 filename 中。这应该让您考虑需要哪些参数。文件处理的其余部分都可以在 store() 内部进行,因为 FILE 不会在调用函数中进一步使用。现在,根据您正在执行的写入次数,在 main() 中声明并打开 FILE 一次然后传递一个打开(并验证)的 FILE* 参数可能非常有意义,这然后只需要一个 fopen 调用和 main() 中的最后一个 close。出于此处的目的,所有内容都将在 store 中处理,因此您可以在每次写入后通过检查 fclose.

的 return 来检查是否有任何流错误

由于您正在分配和存储一个可能需要在调用函数中进一步使用的结构,因此选择 return 指向调用者的指针(或 NULL 失败) store() return 类型的不错选择。你可以这样做:

/* parses data in string into separate values and stores data in string
 * to filename (note: use mode "a" to append instead of "w" which
 * truncates). returns pointer to fully-allocated struct on success,
 * NULL otherwise.
 */
data_t *store (const char *string, const char *filename)
{
    data_tmp_t tmp = { .date = 0 };
    data_t *d = NULL;
    FILE *output = open_output (filename);  /* no need to pass in */
                                            /* not used later in main */
    if (output == NULL) {   /* validate file open for writing */
        return NULL;
    }

    /* parse csv values with sscanf - avoids later need to convert values
     * validate all values successfully converted.
     */
    if (sscanf (string, "%ld,%29[^,],%29[^,],%d,%d,%29[^,],%29[^,]," 
                        "%29[^,],%8[^\n]",
                        &tmp.date, tmp.h_team, tmp.a_team, &tmp.home_score,
                        &tmp.away_score, tmp.reason, tmp.city, tmp.country,
                        tmp.neutral_field) != 9) {
        fprintf (stderr, "error: failed to parse string.\n");
        return NULL;
    }

    d = alloc_data (&tmp);  /* allocate d and deep-copy tmp to d */
    if (d == NULL) {        /* validate allocation/copy succeeded */
        perror ("malloc-alloc_data");
        return NULL;
    }

    /* output values to file */
    fprintf (output, "%ld,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, 
            d->a_team, d->home_score, d->away_score, d->reason, d->city, 
            d->country, d->neutral_field );

    if (fclose (output) == EOF) /* always validate close-after-write */
        perror ("stream error-output");

    return d;   /* return fully allocated/populated struct */
}

你的 main() 然后只能处理你需要解析的字符串,要写入数据的文件名,以及指向解析结果的完全分配结构的指针,因此它可用于进一步使用。 (它还将要写入的文件作为程序的第一个参数——或者如果没有提供参数,它将默认写入 "saida.txt",例如

int main (int argc, char *argv[])
{
    char *string = "18820218,Northern Ireland,England,0,13,Friendly,"
                    "Belfast,Ireland,FALSE";
    /* filename set to 1st argument (or "saida.txt" by default) */
    char *filename = argc > 1 ? argv[1] : "saida.txt";
    data_t *d = NULL;

    d = store (string, filename);   /* store string in filename */

    if (d == NULL) {    /* validate struct returned */
        fprintf (stderr, "error: failed to store string.\n");
        return 1;
    }

    /* output struct values as confirmation of what was stored in file */
    printf ("stored: %ld,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, 
            d->a_team, d->home_score, d->away_score, d->reason, d->city, 
            d->country, d->neutral_field );

    free_data (d);  /* free all memory when done */

    return 0;
}

虽然 C 标准没有强制要求,但 C 的 "standard" 编码风格避免使用 camelCaseMixedCase 变量名,以支持所有 更低的变量名-case 同时保留 大写 名称用于宏和常量。这是一个风格问题——因此完全取决于您,但不遵循它可能会导致在某些圈子中产生错误的第一印象。

总而言之,您可以执行以下操作:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXN  9     /* if you need constants, define one (or more) */
#define MAXC 30
#define MAXL 50

typedef struct {    /* struct to hold dynamically allocated data */
    long date;      /* sized to exact number of chars required. */
    int home_score,
        away_score;
    char *h_team,
        *a_team,
        *reason,
        *city,
        *country,
        *neutral_field;
} data_t;

typedef struct {    /* temp struct to parse data from line */
    long date;      /* sized to hold largest anticipated data */
    int home_score,
        away_score;
    char h_team[MAXC],
        a_team[MAXC],
        reason[MAXC],
        city[MAXC],
        country[MAXC],
        neutral_field[MAXN];
} data_tmp_t;

/* pass filename to open, returns open file stream pointer on
 * success, NULL otherwise.
 */
FILE *open_output (const char *string)
{       
    FILE *output = NULL;

    if ((output = fopen (string, "w")) == NULL)
        fprintf (stderr, "file open failed. '%s'.\n", string);

    return output;
}

/* pass temporary struct containing data, dynamic struct allocated,
 * each member allocated to hold exact number of chars (+ terminating
 * character). pointer to allocated struct returned on success,
 * NULL otherwise.
 */
data_t *alloc_data (data_tmp_t *tmp)
{
    data_t *d = malloc (sizeof *d); /* allocate structure */

    if (d == NULL)
        return NULL;

    d->date = tmp->date;

    /* allocate each string member with strdup. if not available,
     * simply use malloc (strlen(str) + 1), and then strcpy.
     */
    if ((d->h_team = strdup (tmp->h_team)) == NULL)
        return NULL;
    if ((d->a_team = strdup (tmp->a_team)) == NULL)
        return NULL;

    d->home_score = tmp->home_score;
    d->away_score = tmp->away_score;

    if ((d->reason = strdup (tmp->reason)) == NULL)
        return NULL;
    if ((d->city = strdup (tmp->city)) == NULL)
        return NULL;
    if ((d->country = strdup (tmp->country)) == NULL)
        return NULL;
    if ((d->neutral_field = strdup (tmp->neutral_field)) == NULL)
        return NULL;

    return d;   /* return pointer to allocated struct */
}

/* frees each allocated member of d, and then d itself */
void free_data (data_t *d)
{
    free (d->h_team);
    free (d->a_team);
    free (d->reason);
    free (d->city);
    free (d->country);
    free (d->neutral_field);

    free (d);
}

/* parses data in string into separate values and stores data in string
 * to filename (note: use mode "a" to append instead of "w" which
 * truncates). returns pointer to fully-allocated struct on success,
 * NULL otherwise.
 */
data_t *store (const char *string, const char *filename)
{
    data_tmp_t tmp = { .date = 0 };
    data_t *d = NULL;
    FILE *output = open_output (filename);  /* no need to pass in */
                                            /* not used later in main */
    if (output == NULL) {   /* validate file open for writing */
        return NULL;
    }

    /* parse csv values with sscanf - avoids later need to convert values
     * validate all values successfully converted.
     */
    if (sscanf (string, "%ld,%29[^,],%29[^,],%d,%d,%29[^,],%29[^,]," 
                        "%29[^,],%8[^\n]",
                        &tmp.date, tmp.h_team, tmp.a_team, &tmp.home_score,
                        &tmp.away_score, tmp.reason, tmp.city, tmp.country,
                        tmp.neutral_field) != 9) {
        fprintf (stderr, "error: failed to parse string.\n");
        return NULL;
    }

    d = alloc_data (&tmp);  /* allocate d and deep-copy tmp to d */
    if (d == NULL) {        /* validate allocation/copy succeeded */
        perror ("malloc-alloc_data");
        return NULL;
    }

    /* output values to file */
    fprintf (output, "%ld,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, 
            d->a_team, d->home_score, d->away_score, d->reason, d->city, 
            d->country, d->neutral_field );

    if (fclose (output) == EOF) /* always validate close-after-write */
        perror ("stream error-output");

    return d;   /* return fully allocated/populated struct */
}

int main (int argc, char *argv[])
{
    char *string = "18820218,Northern Ireland,England,0,13,Friendly,"
                    "Belfast,Ireland,FALSE";
    /* filename set to 1st argument (or "saida.txt" by default) */
    char *filename = argc > 1 ? argv[1] : "saida.txt";
    data_t *d = NULL;

    d = store (string, filename);   /* store string in filename */

    if (d == NULL) {    /* validate struct returned */
        fprintf (stderr, "error: failed to store string.\n");
        return 1;
    }

    /* output struct values as confirmation of what was stored in file */
    printf ("stored: %ld,%s,%s,%d,%d,%s,%s,%s,%s\n", d->date, d->h_team, 
            d->a_team, d->home_score, d->away_score, d->reason, d->city, 
            d->country, d->neutral_field );

    free_data (d);  /* free all memory when done */

    return 0;
}

示例Use/Output

$ ./bin/store_teams dat/saida.txt
stored: 18820218,Northern Ireland,England,0,13,Friendly,Belfast,Ireland,FALSE

验证输出文件

$ cat dat/saida.txt
18820218,Northern Ireland,England,0,13,Friendly,Belfast,Ireland,FALSE

内存Use/Error检查

不需要投malloc的return,没必要。参见:Do I cast the result of malloc?

在您编写的任何动态分配内存的代码中,您对分配的任何内存块负有 2 责任:(1) 始终保留指向内存块的起始地址 因此,(2) 当不再需要它时可以释放

您必须使用内存错误检查程序来确保您不会尝试访问内存或写入 beyond/outside 您分配的块的边界,尝试读取或基于未初始化的条件跳转值,最后,确认您释放了所有已分配的内存。

对于Linux valgrind是正常的选择。每个平台都有类似的内存检查器。它们都很简单易用,只需运行你的程序就可以了。

$ valgrind ./bin/store_teams dat/saida.txt
==16038== Memcheck, a memory error detector
==16038== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==16038== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==16038== Command: ./bin/store_teams dat/saida.txt
==16038==
stored: 18820218,Northern Ireland,England,0,13,Friendly,Belfast,Ireland,FALSE
==16038==
==16038== HEAP SUMMARY:
==16038==     in use at exit: 0 bytes in 0 blocks
==16038==   total heap usage: 8 allocs, 8 frees, 672 bytes allocated
==16038==
==16038== All heap blocks were freed -- no leaks are possible
==16038==
==16038== For counts of detected and suppressed errors, rerun with: -v
==16038== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

始终确认您已释放所有分配的内存并且没有内存错误。

希望这可以帮助您了解如何更好地以一种不那么混乱的方式将拼图拼凑在一起,如何关注每个函数需要哪些参数,以及如何考虑选择有意义的类型 return 为您的每个功能。查看所有内容,如果您还有其他问题,请告诉我。