将 void 指针（结构的一部分）转换为另一种指针数据类型

Question

我正在尝试自己弄清楚如何在 C 中解析 S 表达式，以便为我自己的基本 Lisp 存储数据和代码（作为学习练习编写，而不是用于生产）。

在解释我的代码和我的推理之前，我应该解释一下，我对 S-expressions 的了解是关于它的维基百科文章的介绍部分，偶尔浏览一下 Common Lisp 代码，所以我的结构的命名变量可能有点偏差。

我的实现语言是 C，在定义任何函数之前，我创建了以下结构：

typedef enum {
    string,
    letter,
    integer,
} atom_type;

typedef struct {
    void* blob;
    atom_type type;
} atom;

typedef struct expr {
    atom* current;
    struct expr* next;
} expr;

每个原子都存储在一个结构atom中，其中包含一个枚举实例（？我不确定这个的正确行话）和一个指向要存储的数据的空指针。每个 S 表达式“节点”都包含一个指向原子的指针和一个指向下一个 S 表达式节点的指针。

我编写了一个基本函数，它接受一个字符串并将其解析为一个原子，如下所示：

atom* parse_term(char* str) {
    size_t len = strlen(str);
    atom* current = malloc(sizeof(atom));
    
    if(str[0] == '\'') {
        current->blob = (char*) &str[1];
        current->type = letter;
    } else if(str[0] == '\"') {
        char temp[256];
        int pos = 1;

        while(str[pos] != '\"') {
            temp[pos] = str[pos];
            pos++;
        }
        current->blob = malloc(256 * sizeof(char));
        current->blob = (char*) &temp;
        current->type = string;
    } else if(isdigit(str[0])){
        char temp[256];
        int pos = 0;

        while(str[pos] != ' ') {
            temp[pos] = str[pos];
            pos++;
        }
        int tmp = atoi(temp);
        current->blob = (int*) &tmp;
        current->type = integer;
    }
    return current;
}

函数似乎工作正常；至少，当我打印出它正确显示的数据类型时。但除此之外，我不知道如何打印出实际的 'blob'：我试过使用 %p 格式化代码，以及一个 switch 语句：

void print_atom(atom* current) {
    switch(current->type) {
        case string:
            printf("atom%s\ttype:%d", current->blob, current->type);
        case letter:
            printf("atom%c\ttype:%d", current->blob, current->type);
        case integer:
            printf("atom%c\ttype:%d", current->blob, current->type);
    }
}

但这不起作用。在字符串的情况下，它 returns 是乱码文本，而在其他所有情况下，它只是不打印任何应该是原子信息的地方。

我想这是我在结构中使用 void* 的结果；我该如何补救？我认为我确实进行了正确的转换（尽管我很可能是错的，请告诉我），我能想到的唯一其他选择是在 'atom' 结构中为每个支持的数据类型存储一个硬编码变量，但这似乎浪费资源。

Answer 1

不要使用 void*。使用 union。这就是 union 的用途。

在这个例子中，我使用了一个“匿名联合”，这意味着我可以直接引用它的字段，就好像它们直接在 Atom 结构中一样。（我根据我的偏见改变了名字的拼写，所以类型是大写的，常量是全大写的。我还分离了 Atom 的 typedef 和 struct 声明，以防 Atom 是自引用的。

typedef enum {
    STRING,
    LETTER,
    INTEGER
} AtomType;

typedef struct Atom Atom;
struct Atom {
    union {
      char* str;
      char  let;
      int   num;
    };
    AtomType type;
};

void print_atom(Atom* current) {
    switch(current->type) {
        case STRING:
            printf("atom %s\ttype:%d", current->str, current->type);
        case LETTER:
            printf("atom %c\ttype:%d", current->let, current->tyoe);
        case INTEGER:
            printf("atom %d\ttype:%d", current->num, current->type);
    }
}

正如有人在评论中所说，这实际上不是 Lisp 对象的样子。通常的实现是结合 cons 单元和原子，像这样（ 而不是 AtomType）。您还需要将 CELL 添加到枚举中。

typedef struct Cell Cell;
struct Cell {
    union {
        char* str;
        char  let;
        int   num;
        struct {
            Cell* hd; // Historic name: car
            Cell* tl; // Historic name: cdr
        };
    };
    CellType type;
};

此处在匿名联合中有一个匿名结构。有人说这令人困惑。其他人（无论如何，我）说它的句法噪音较少。用你自己的判断。

在Cell的定义中使用Cell*是typedef struct Cell Cell的动机。

您可以玩不完全便携但通常还可以的游戏来减少 Cell 的内存消耗，大多数实际实现都可以。我没有，因为这是一次学习经历。

另请注意，真正的 Lisp（和许多玩具）有效地避免了大多数解析任务；该语言包含字符宏，可以有效地执行所需的解析（并不多）；大多数情况下，它们可以在 Lisp 本身中实现（尽管您需要一些方法 bootstrap）。

将 void 指针（结构的一部分）转换为另一种指针数据类型

Casting a void pointer (that is part of a struct) into another pointer data type

c

lisp

parsing