指针在字符串内移动

Pointer moving inside string

我正在研究这段代码,但我不明白指针是如何在 buffer

内部移动的

...

  while(fgets(buffer,buf_size,fp) != NULL){  
    read_line_p = malloc((strlen(buffer)+1)*sizeof(char));   
    strcpy(read_line_p,buffer);   
    char *string_field_in_read_line_p = strtok(read_line_p,",");
    char *integer_field_in_read_line_p = strtok(NULL,",");  

    char *string_field_1 = malloc((strlen(string_field_in_read_line_p)+1)*sizeof(char));
    char *string_field_2 = malloc((strlen(string_field_in_read_line_p)+1)*sizeof(char));  

    strcpy(string_field_1,string_field_in_read_line_p);
    strcpy(string_field_2,string_field_in_read_line_p);    
    int integer_field = atoi(integer_field_in_read_line_p);  

    struct record *record_p = malloc(sizeof(struct record));   
    record_p->string_field = string_field_1;
    record_p->integer_field = integer_field;

    ordered_array_add(array, (void*)record_p);

    free(read_line_p);
  }

...

源代码是这样做的:

.csv 文件中读取数百万条记录,这些记录由一个字符串和一个整数组成,由 , 分隔,每条记录都放在不同的行上;每条记录都作为单个元素添加到我们必须排序的通用数组中。通用数组由

表示
typedef struct {
  void** array; 
  unsigned long el_num; //index
  unsigned long array_capacity; //length
  int (*precedes)(void*,void*); //precedence relation (name of a function in main which denota which one field we're comparing)
}OrderedArray;

在这个结构中,我们有一个 precedes 函数,它告诉我们是否必须按字符串字段或整数字段对数组进行排序。

csv 文件中的记录示例

第一个词,10

第二个字,9

第三个字,8 EC..

因此,在每次执行 ordered_array_add 时,我们都会在数组中插入一个新元素。

关注 ordered_array_add

void ordered_array_add(OrderedArray *ordered_array, void* element){
  if(element == NULL){
    fprintf(stderr,"add_ordered_array_element: element parameter cannot be NULL");
    exit(EXIT_FAILURE);
  }

  if(ordered_array->el_num >= ordered_array->array_capacity){
    ordered_array->array = realloc(ordered_array->array,2*(ordered_array->array_capacity)*sizeof(void*));
    if(ordered_array->array == NULL){
      fprintf(stderr,"ordered_array_add: unable to reallocate memory to host the new element");
      exit(EXIT_FAILURE);
    }
    ordered_array->array_capacity = 2*ordered_array->array_capacity;
  }

  unsigned long index = get_index_to_insert(ordered_array, element);

  insert_element(ordered_array,element,index);

  (ordered_array->el_num)++;

}

我不明白第一个循环如何扫描字符串 buffer,因为我在提到的循环中没有看到任何索引。

我写了一个与我发布的第一个循环类似的代码,问题是它在从 buffer 读取第一个单词后停止,而我正在研究的代码成功读取了整个字符串

while(fgets(buffer,buf_size,fp) != NULL) {
char *word = strtok(buffer, " ,.:");

    add(words_to_correct, word);
    words_to_correct->el_num = words_to_correct->el_num+1;
    printf("%s\n", word);

}

您的整个第一个循环可以重写为:

while(fgets(buffer,buf_size,fp) != NULL){  
    // note how sizeof() is used - that way if the type of
    // record_p changes, no changes to this code are needed
    struct record *record_p = malloc(sizeof(*record_p));   

    // no need at all for temporary copies of the strings
    record_p->string_field = strdup(strtok(buffer,","));
    record_p->integer_field = atoi(strtok(NULL,","));

    ordered_array_add(array, (void*)record_p);
  }

不需要多次调用 malloc()strcpy() - 这对可以用 strdup() 替换 - 这都是 POSIX-standard and supported on Windows 所以它非常广泛可用.

当然,该代码需要错误检查和 it shouldn't be using atoi() at all,但正如此处发布的那样,它会重复您的原始功能。

还有一个额外的好处,就是您实际上可以分辨出发生了什么。

您的代码

while(fgets(buffer,buf_size,fp) != NULL) {
char *word = strtok(buffer, " ,.:");

    add(words_to_correct, word);
    words_to_correct->el_num = words_to_correct->el_num+1;
    printf("%s\n", word);

}

只会处理每行中的第一个单词 - 你需要继续调用 strtok() 直到它 returns NULL,因为 strtok() only returns one token:

while(fgets(buffer,buf_size,fp) != NULL) {
    // trick to keep loop simple - start by using
    // buffer on the first loop iteration, then
    // set tmp to NULL so later iterations works too
    char *tmp = buffer;
    // loop until strtok() returns null
    for ( ;; )
    {
        // note use of tmp here
        char *word = strtok(tmp, " ,.:");

        // line is fully parsed - break this loop
        // and get the next line to parse
        if (word == NULL)
        {
            break;
        }

        // now set tmp to NULL so next strtok()
        // gets a NULL first parameter
        tmp = NULL;

        add(words_to_correct, word);
        words_to_correct->el_num = words_to_correct->el_num+1;
        printf("%s\n", word);
    }

}

另请注意,我正在分散内容,而不是试图在每一行中填充尽可能多的代码。这通常更容易阅读。