标记外部文件

Question

所以我一直对如何标记 FIRST 标记并将该值放入结构中感到困惑。在我的例子中，我试图从一个看起来像这样的文件中读取行：

TDV 格式：

 TN     1424325600000   dn20t1kz0xrz    67.0    0.0  0.0     0.0    101872.0    262.5665
 TN     1422770400000   dn2dcstxsf5b    23.0    0.0  100.0   0.0    100576.0    277.8087
 TN     1422792000000   dn2sdp6pbb5b    96.0    0.0  100.0   0.0    100117.0    278.49207
 TN     1422748800000   dn2fjteh8e80    6.0     0.0  100.0   0.0    100661.0    278.28485
 TN     1423396800000   dn2k0y7ffcup    14.0    0.0  100.0   0.0    100176.0    282.02142

如您所见，有一个 TN 表示州代码。在我下面的函数中，我需要能够识别一行是针对特定状态的，并将其发送到结构。

这是我应该执行此操作的函数。我已经评论了我需要在此功能中做的事情列表。我以为我做的是对的，但是当我打印出来时，事实证明确实发生了完全不同的事情：

void analyze_file(FILE *file, struct climate_info **states, int num_states)
{
    const int line_sz = 100;
    char line[line_sz];
    int counter = 0;
    char *token;

    while (fgets(line, line_sz, file) != NULL)
    {
        /* TODO: We need to do a few things here:
         *
         *       * Tokenize the line.
         *       * Determine what state the line is for. This will be the state
         *         code, stored as our first token.
         *       * If our states array doesn't have a climate_info entry for
         *         this state, then we need to allocate memory for it and put it
         *         in the next open place in the array. Otherwise, we reuse the
         *         existing entry.
         *       * Update the climate_info structure as necessary.
         */
        struct climate_info *states = malloc(sizeof(struct climate_info)*num_states);
        token = strtok(line," \n");
        strcpy(states->code, token);
        //printf("token: %s\n", token);

        while(token)
        {

            printf("token: %s\n", token);
            token = strtok(NULL, " \t");

        }
    }
    printf("%d\n",counter);

}

这是我定义的结构：

struct climate_info
{
    char code[3];
    unsigned long num_records;
    long long millitime;
    char location[13];
     double humidity;
    double snow;
    double cloud;
    double lightning;
    long double pressure;
     double temperature;
};

这是我打印输出的地方，这是我的程序似乎无法识别 analyze_file 函数中正在执行的操作的地方：

void print_report(struct climate_info *states[], int num_states)
{
    printf("States found: ");
    int i;
    for (i = 0; i < num_states; ++i)
    {
        if (states[i] != NULL)
        {
            struct climate_info *info = states[i];
            printf("%s", info->code);
        }
    }
    printf("\n");

输出应如下所示：找到的州：TN 我能够标记我的字符串并输出每行的每个标记但是问题是当我尝试给出结构值时。在我的 analyze_file 行中： strcpy(states->code, token);我正在尝试获取我知道是状态代码的第一个令牌，并将其提供给我从我的结构中创建的分配的 space。正如您从我的 print_report 函数中看到的那样，它似乎没有识别出我正在向气候信息发送值。我的问题是如何在不更改 print_report 函数的情况下修复 analyze_file 函数。

Answer 1

您在尝试弄清楚如何使用 "TN" 时遇到的困难似乎主要源于您试图将每一行中读取的所有数据存储在单独的结构中。正如评论中提到的，这对于将数据读入数据库可能没问题，因为数据库提供了按州缩写查询所有记录的能力，但会使处理数据有点尴尬。为什么？

当您将所有记录存储为单独的结构时，除了结构的 code 成员之外，数据所属的状态与存储的信息之间没有任何关系。这意味着如果您希望搜索或打印信息，例如"TN" 您必须遍历每个结构体以检查 code 成员是否匹配 "TN"。考虑打印。您必须为每个状态循环，然后在每次为要打印的单个状态挑选信息时循环遍历每个结构。

与其将每条信息记录都存储为记录数组中的一个元素，不如拥有一个状态数组，其中每个状态都包含指向该状态数据的指针。这将使您的 num_records 成员更有意义。然后，您只需遍历状态数组，检查是否 (num_records > 0) 然后打印该状态的 num_records 价值的信息，同时跳过所有未存储数据的状态。这提供了一种更有效的方法。

例如，稍微重新排列您的结构以提供状态和与该状态关联的数据之间的关系几乎不需要付出任何努力，例如：

#include <stdio.h>
#include <stdlib.h>

/* if you need constants, either #define them or use an enum */
enum { ABRV = 2, NDATA = 8, LOC = 13, NAME = 15, MAXC = 1024 };
...
typedef struct {            /* struct holding only climate data */
    long long millitime;
    char location[LOC];
    double humidity;
    double snow;
    double cloud;
    double lightning;
    long double pressure;
    double temperature;
} climate_t;

typedef struct {
    size_t  num_allocated,  /* track of how many data are allocated */
            num_records;
    climate_t *data;        /* a pointer to allocated block for data */
} statedata_t;

但是如何关联从文件中读取 "TN" 以获取以正确状态存储的数据？这就是查找 table 的用武之地。如果您有另一个包含州名称和缩写的简单结构，您可以创建一个简单的结构数组来保存缩写信息和阅读时间，例如"TN" 从文件中，您可以简单地 "look-up" 索引其中 "TN" 位于您的数组中并包含缩写，然后使用 index 将该行的信息存储在 statedata_t 数组中相应的 index。

由于您的 "lookup-array" 将是常量，它可以简单地是一个声明为 const 的全局变量。如果您使用多个源文件，您可以简单地在一个文件中定义数组并在需要它的其余文件中将其声明为 extern。那么你会如何定义它呢？首先声明一个结构，其中包含您在查找中所需的信息（州名和缩写），然后声明一个常量数组，为每个结构初始化名称和缩写，例如

typedef struct {
    char name[NAME+1],
        abrv[ABRV+1];
} stateabrv_t;
...
const stateabrv_t state[]  =  { { "Alabama",        "AL" },
                                { "Alaska",         "AK" },
                                { "Arizona",        "AZ" },
                                { "Arkansas",       "AR" },
                                { "California",     "CA" },
                                { "Colorado",       "CO" },
                                { "Connecticut",    "CT" },
                                { "Delaware",       "DE" },
                                { "Florida",        "FL" },
                                { "Georgia",        "GA" },
                                { "Hawaii",         "HI" },
                                { "Idaho",          "ID" },
                                { "Illinois",       "IL" },
                                { "Indiana",        "IN" },
                                { "Iowa",           "IA" },
                                { "Kansas",         "KS" },
                                { "Kentucky",       "KY" },
                                { "Louisiana",      "LA" },
                                { "Maine",          "ME" },
                                { "Maryland",       "MD" },
                                { "Massachusetts",  "MA" },
                                { "Michigan",       "MI" },
                                { "Minnesota",      "MN" },
                                { "Mississippi",    "MS" },
                                { "Missouri",       "MO" },
                                { "Montana",        "MT" },
                                { "Nebraska",       "NE" },
                                { "Nevada",         "NV" },
                                { "New Hampshire",  "NH" },
                                { "New Jersey",     "NJ" },
                                { "New Mexico",     "NM" },
                                { "New York",       "NY" },
                                { "North Carolina", "NC" },
                                { "North Dakota",   "ND" },
                                { "Ohio",           "OH" },
                                { "Oklahoma",       "OK" },
                                { "Oregon",         "OR" },
                                { "Pennsylvania",   "PA" },
                                { "Rhode Island",   "RI" },
                                { "South Carolina", "SC" },
                                { "South Dakota",   "SD" },
                                { "Tennessee",      "TN" },
                                { "Texas",          "TX" },
                                { "Utah",           "UT" },
                                { "Vermont",        "VT" },
                                { "Virginia",       "VA" },
                                { "Washington",     "WA" },
                                { "West Virginia",  "WV" },
                                { "Wisconsin",      "WI" },
                                { "Wyoming",        "WY" } };

const int nstates = sizeof state / sizeof *state;

现在你有一个简单的双向查找。给定州名称或缩写，您可以 return index 它在数组中的位置。此外，给定名称可以查找缩写，或者给定缩写可以查找名称。

一个简单的查找函数 returning 索引可以是：

/* simple lookup function, given a code s, return index for state
 * in array of statedata_t on success, -1 otherwise.
 */
int lookupabrv (const char *s)
{
    int i = 0;

    for (; i < nstates; i++)
        if (state[i].abrv[0] == s[0] && state[i].abrv[1] == s[1])
            return i;

    return -1;
}

现在您可以使用全局查找 table 找到给定缩写的索引，您可以将剩余的数据处理放在一起 main() 通过声明 50 statedata_t 的数组，例如

int main (int argc, char **argv) {

    char buf[MAXC]; /* line buffer */
    /* array of 50 statedata_t (one for each state) */
    statedata_t stdata[sizeof state / sizeof *state] = {{.num_records = 0}};

现在您已准备好开始读取文件，insert_data 根据从文件中读取的缩写获取正确的状态。读取的一种简单方法是将 "TN" 读入一个单独的数组，然后将气候数据读入一个 temporary stuct 类型 climate_t可以传递给您的 insert_data 函数。在您的 insert_data 函数中，您只需查找索引（根据需要为 data 分配或重新分配），然后将临时数据结构分配给 state.data 的内存块。例如，您的 insert_data 函数可能类似于以下内容：

/* insert data for state given code and climate_t containing data */
int insert_data (statedata_t *st, const char *code, climate_t *data)
{
    int index = lookupabrv (code);  /* lookup array index */

    if (index == -1)    /* handle error */
        return 0;

    if (!st[index].num_allocated) { /* allocate data if not allocated */
        st[index].data = malloc (NDATA * sizeof *st[index].data);
        if (!st[index].data) {
            perror ("malloc-st[index].data");
            return 0;
        }
        st[index].num_allocated = NDATA;
    }

    /* check if realloc needed */
    if (st[index].num_records == st[index].num_allocated) {
        /* realloc here, update num_allocated */
    }

    /* add data for proper state index */
    st[index].data[st[index].num_records++] = *data;

    return 1;   /* return success */
}

基本上就是这样。如何解析每一行的信息取决于您，但出于我的示例的目的，给定您的样本数据，为了简单起见，我只是使用 sscanf 。总而言之，您可以执行以下操作：

#include <stdio.h>
#include <stdlib.h>

/* if you need constants, either #define them or use an enum */
enum { ABRV = 2, NDATA = 8, LOC = 13, NAME = 15, MAXC = 1024 };

typedef struct {
    char name[NAME+1],
        abrv[ABRV+1];
} stateabrv_t;

typedef struct {            /* struct holding only climate data */
    long long millitime;
    char location[LOC];
    double humidity;
    double snow;
    double cloud;
    double lightning;
    long double pressure;
    double temperature;
} climate_t;

typedef struct {
    size_t  num_allocated,  /* track of how many data are allocated */
            num_records;
    climate_t *data;        /* a pointer to allocated block for data */
} statedata_t;

const stateabrv_t state[]  =  { { "Alabama",        "AL" },
                                { "Alaska",         "AK" },
                                { "Arizona",        "AZ" },
                                { "Arkansas",       "AR" },
                                { "California",     "CA" },
                                { "Colorado",       "CO" },
                                { "Connecticut",    "CT" },
                                { "Delaware",       "DE" },
                                { "Florida",        "FL" },
                                { "Georgia",        "GA" },
                                { "Hawaii",         "HI" },
                                { "Idaho",          "ID" },
                                { "Illinois",       "IL" },
                                { "Indiana",        "IN" },
                                { "Iowa",           "IA" },
                                { "Kansas",         "KS" },
                                { "Kentucky",       "KY" },
                                { "Louisiana",      "LA" },
                                { "Maine",          "ME" },
                                { "Maryland",       "MD" },
                                { "Massachusetts",  "MA" },
                                { "Michigan",       "MI" },
                                { "Minnesota",      "MN" },
                                { "Mississippi",    "MS" },
                                { "Missouri",       "MO" },
                                { "Montana",        "MT" },
                                { "Nebraska",       "NE" },
                                { "Nevada",         "NV" },
                                { "New Hampshire",  "NH" },
                                { "New Jersey",     "NJ" },
                                { "New Mexico",     "NM" },
                                { "New York",       "NY" },
                                { "North Carolina", "NC" },
                                { "North Dakota",   "ND" },
                                { "Ohio",           "OH" },
                                { "Oklahoma",       "OK" },
                                { "Oregon",         "OR" },
                                { "Pennsylvania",   "PA" },
                                { "Rhode Island",   "RI" },
                                { "South Carolina", "SC" },
                                { "South Dakota",   "SD" },
                                { "Tennessee",      "TN" },
                                { "Texas",          "TX" },
                                { "Utah",           "UT" },
                                { "Vermont",        "VT" },
                                { "Virginia",       "VA" },
                                { "Washington",     "WA" },
                                { "West Virginia",  "WV" },
                                { "Wisconsin",      "WI" },
                                { "Wyoming",        "WY" } };

const int nstates = sizeof state / sizeof *state;

/* simple lookup function, given a code s, return index for state
 * in array of statedata_t on success, -1 otherwise.
 */
int lookupabrv (const char *s)
{
    int i = 0;

    for (; i < nstates; i++)
        if (state[i].abrv[0] == s[0] && state[i].abrv[1] == s[1])
            return i;

    return -1;
}

/* insert data for state given code and climate_t containing data */
int insert_data (statedata_t *st, const char *code, climate_t *data)
{
    int index = lookupabrv (code);  /* lookup array index */

    if (index == -1)    /* handle error */
        return 0;

    if (!st[index].num_allocated) { /* allocate data if not allocated */
        st[index].data = malloc (NDATA * sizeof *st[index].data);
        if (!st[index].data) {
            perror ("malloc-st[index].data");
            return 0;
        }
        st[index].num_allocated = NDATA;
    }

    /* check if realloc needed */
    if (st[index].num_records == st[index].num_allocated) {
        /* realloc here, update num_allocated */
    }

    /* add data for proper state index */
    st[index].data[st[index].num_records++] = *data;

    return 1;   /* return success */
}

/* print states with data collected */
void print_data (statedata_t *st)
{
    int i = 0;

    for (; i < nstates; i++) {
        if (st[i].num_records) {
            size_t j = 0;
            printf ("\n%s\n", state[i].name);
            for (; j < st[i].num_records; j++)
                printf ("  %13lld  %-12s %5.1f %5.1f %5.1f %5.1f %8.1Lf "
                        "%8.4f\n",
                        st[i].data[j].millitime, st[i].data[j].location,
                        st[i].data[j].humidity, st[i].data[j].snow,
                        st[i].data[j].cloud, st[i].data[j].lightning,
                        st[i].data[j].pressure, st[i].data[j].temperature);
        }
    }
}

/* free allocated memory */
void free_data (statedata_t *st)
{
    int i = 0;

    for (; i < nstates; i++)
        if (st[i].num_records)
            free (st[i].data);
}

int main (int argc, char **argv) {

    char buf[MAXC]; /* line buffer */
    /* array of 50 statedata_t (one for each state) */
    statedata_t stdata[sizeof state / sizeof *state] = {{.num_records = 0}};
    /* read from file given as argument (or stdin if none given) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    while (fgets (buf, MAXC, fp)) {     /* read each line of data */
        char code[ABRV+1] = "";         /* declare storage for abriviation */
        climate_t tmp = { .millitime = 0 }; /* declare temp stuct for data */

        /* simple parse of data with sscanf */
        if (sscanf (buf, "%2s %lld %12s %lf %lf %lf %lf %Lf %lf", code,
            &tmp.millitime, tmp.location, &tmp.humidity, &tmp.snow,
            &tmp.cloud, &tmp.lightning, &tmp.pressure, &tmp.temperature)
            == 9) {
            if (!insert_data (stdata, code, &tmp))  /* insert data/validate */
                fprintf (stderr, "error: insert_data failed (%s).\n", code);
        }
        else    /* handle error */
            fprintf (stderr, "error: invalid format:\n%s\n", buf);
    }
    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    print_data (stdata);    /* print data */
    free_data (stdata);     /* free allocated memory */

    return 0;
}

示例输入文件

$ cat dat/state_climate.txt
 TN     1424325600000   dn20t1kz0xrz    67.0    0.0  0.0     0.0    101872.0    262.5665
 TN     1422770400000   dn2dcstxsf5b    23.0    0.0  100.0   0.0    100576.0    277.8087
 TN     1422792000000   dn2sdp6pbb5b    96.0    0.0  100.0   0.0    100117.0    278.49207
 TN     1422748800000   dn2fjteh8e80    6.0     0.0  100.0   0.0    100661.0    278.28485
 TN     1423396800000   dn2k0y7ffcup    14.0    0.0  100.0   0.0    100176.0    282.02142

例子Use/Output

$ ./bin/state_climate <dat/state_climate.txt

Tennessee
  1424325600000  dn20t1kz0xrz  67.0   0.0   0.0   0.0 101872.0 262.5665
  1422770400000  dn2dcstxsf5b  23.0   0.0 100.0   0.0 100576.0 277.8087
  1422792000000  dn2sdp6pbb5b  96.0   0.0 100.0   0.0 100117.0 278.4921
  1422748800000  dn2fjteh8e80   6.0   0.0 100.0   0.0 100661.0 278.2849
  1423396800000  dn2k0y7ffcup  14.0   0.0 100.0   0.0 100176.0 282.0214

内存Use/Error检查

在您编写的任何动态分配内存的代码中，您对分配的任何内存块负有 2 责任：(1) 始终保留指向内存块的起始地址，因此，(2) 当不再需要它时可以释放。

您必须使用内存错误检查程序来确保您不会尝试访问内存或写入 beyond/outside 您分配的块的边界，尝试读取或基于未初始化的条件跳转值，最后，确认您释放了所有已分配的内存。

对于Linux valgrind是正常的选择。每个平台都有类似的内存检查器。它们都很简单易用，只需运行你的程序就可以了。

$ valgrind ./bin/state_climate <dat/state_climate.txt
==6157== Memcheck, a memory error detector
==6157== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==6157== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==6157== Command: ./bin/state_climate
==6157==

Tennessee
  1424325600000  dn20t1kz0xrz  67.0   0.0   0.0   0.0 101872.0 262.5665
  1422770400000  dn2dcstxsf5b  23.0   0.0 100.0   0.0 100576.0 277.8087
  1422792000000  dn2sdp6pbb5b  96.0   0.0 100.0   0.0 100117.0 278.4921
  1422748800000  dn2fjteh8e80   6.0   0.0 100.0   0.0 100661.0 278.2849
  1423396800000  dn2k0y7ffcup  14.0   0.0 100.0   0.0 100176.0 282.0214
==6157==
==6157== HEAP SUMMARY:
==6157==     in use at exit: 0 bytes in 0 blocks
==6157==   total heap usage: 1 allocs, 1 frees, 768 bytes allocated
==6157==
==6157== All heap blocks were freed -- no leaks are possible
==6157==
==6157== For counts of detected and suppressed errors, rerun with: -v
==6157== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

始终确认您已释放所有分配的内存并且没有内存错误。

仔细研究并考虑为什么结构中的更改有意义。如果您有任何问题，请告诉我。

标记外部文件

Tokening An External File

c

struct

strtok

strcpy

dynamic-memory-allocation