在 C 中查找 char* 的子字符串时,使用单链表的程序会出现段错误
Program that uses singly-linked list is segfaulting when finding substring of a char* in C
我正在编写一个简单的 Lisp 解释器,并且正在编写一个 S-exp 解析器。我决定逐步解析 S 表达式,即。通过迭代
字符串 S 表达式并使用字符线索(例如字符是否为 '(', ')'、'"'、''' 等)来确定当前标记的类型及其相关值。我正在解析使用单个 linked 列表将 S 表达式存入内存,也就是说,在迭代 S 表达式时,我的程序将获取令牌,将其解析为结构,然后 link 使用下一个标记。
这是我目前的代码:
#include <stdio.h>
#include <stddef.h>
#include <string.h>
#include <stdlib.h>
#define MAX_SNODE_LENGTH 256
enum styp {
/*
Enum that declares all possible types
that a token could be in my lisp S-expression.
In order, opening paranthesis, closing paranthesis,
string, integer, decimal (float), charachter, nil
*/
oparn,
cparn,
strn,
intg,
dec,
chac,
nil,
};
struct snode_t {
/*
When a S-expression is parsed every token is
parsed into this. All data is stored in a
string, but when operations need to be completed on it
they are converted based on their type (defined by enum
styp typ).
Each node points to the next one. When a S-exp is parsed,
an snode_t is returned that is the first item. THe programmer
can then iterate over the list and use the data stored in "val"
to interpret code, etc.
*/
char val[MAX_SNODE_LENGTH];
enum styp typ;
struct snode_t* nxt;
};
struct snode_t* node() {
/*
Function returns the pointer to a struct snode_t
which has had memory allocated for it. This function
is just for convenience.
*/
struct snode_t* tmp = malloc(sizeof(struct snode_t*));
tmp->nxt = malloc(sizeof(struct snode_t*));
return tmp;
}
char* sstring(char* str, int s, int e) {
/*
Takes a string, a starting index and an
end index, and allocates the memory for a substring,
creates a substring and returns it. Saves programmer
time fiddling with memory when getting a substring.
*/
size_t n = e - s + 2;
// n is the number of bytes needed to be allocated
char* sub = malloc(sizeof(char) * n);
memcpy(sub, &str[s], n-1);
sub[n] = '[=10=]';
return sub;
}
struct snode_t* psexp(char* str) {
struct snode_t* head = node();
// head is the first item in the linked list.
// head is the struct that is returned once
// the function completes.
struct snode_t* aft = node();
// aft contains the current struct.
// once the type of a token is determined,
// aft is set to the "nxt" value of the current
// struct. aft is then assigned the according values.
int p = 1;
// indicates the position in the string. Starts
// at 1 because the first index of a properly formatted
// S-exp will always be a closing bracket.
// Note that this doesn't consider trailing whitespace,
// I'm only writing functions that do the bare minimum
// at this point so that I can make it more robust
// and safe later.
if(str[0] == '(') {
head->val[0] = '(';
head->val[1] = '[=10=]';
head->typ = oparn;
aft = head->nxt;
}
while(str[p] != '[=10=]') {
if(str[p] == '(') {
head->val[0] = '(';
head->val[1] = '[=10=]';
aft->typ = oparn;
} else if(str[p] == ')') {
head->val[0] = ')';
head->val[1] = '[=10=]';
aft->typ = cparn;
} else if(str[p] == '"') {
// if the current charachter is a
// ", all the following charachters
// until the next " comprise a string
// this next set of lines finds the
// position of the next " after the current one,
// and uses that number to produce a substring
// This substring is then stored in the current
// struct.
int s = p;
int e;
p++;
while(str[p] != '"') {
p++;
}
e = p++;
char* tocp = sstring(str, s, e);
strcpy(aft->val, tocp);
aft->typ = strn;
}
aft->nxt = node();
aft = aft->nxt;
// the current struct is shifted to the next one.
}
return head;
}
int main() {
psexp("(\"hello world!\")");
return 0;
}
当我编译它时,我没有收到任何警告或错误,但是,当 运行 它时,我得到了一个段错误。我已经测试了 node 函数和 sstring 函数,所以我有理由相信 psexp 函数存在内存故障。
我不确定我分配内存的方式有什么问题,在 link 将一个结构转换为另一个结构之前,我确保分配了内存。是什么导致了我的段错误?
注意:目前 psexp 只关心括号和字符串,但是一旦我让字符串工作(我认为是导致段错误的区域),我打算添加整个 Enum 的数据类型.
这是一个错误:
struct snode_t* tmp = malloc(sizeof(struct snode_t*));
^
ups... pointer
所以当你真的想要一个“struct snode_t”时,你为“指向struct snode_t的指针”分配了内存。
更正为
struct snode_t* tmp = malloc(sizeof(struct snode_t));
或更好
struct snode_t* tmp = malloc(sizeof *tmp);
此外,这很奇怪:
tmp->nxt = malloc(sizeof(struct snode_t*));
创建新节点时你通常做的是:
tmp->nxt = NULL;
在链表中,你永远不会 malloc
指向下一个指针的任何东西。
在 node
中,您正在对节点 指针 的大小执行 malloc
而不是 [=15= 的完整大小].
修复后,程序无限循环
预分配 nxt
不好。 node
应该只分配一个新节点 而不是 两个。
在 sstring
中执行 malloc
会泄漏内存。最好将目标缓冲区作为参数传递。
对于 '('
和 ')'
的情况,p
从未增加,所以如果我们命中一个,我们将永远循环
链表代码需要修改。
这是重构后的版本:
#include <stdio.h>
#include <stddef.h>
#include <string.h>
#include <stdlib.h>
#define MAX_SNODE_LENGTH 256
enum styp {
/*
Enum that declares all possible types that a token could be in my
lisp S-expression. In order, opening paranthesis, closing paranthesis,
string, integer, decimal (float), charachter, nil */
oparn,
cparn,
strn,
intg,
dec,
chac,
nil,
};
struct snode_t {
/*
When a S-expression is parsed every token is parsed into this. All
data is stored in a string, but when operations need to be completed
on it they are converted based on their type (defined by enum styp
typ). Each node points to the next one. When a S-exp is parsed, an
snode_t is returned that is the first item. THe programmer can then
iterate over the list and use the data stored in "val" to interpret
code, etc. */
char val[MAX_SNODE_LENGTH];
enum styp typ;
struct snode_t *nxt;
};
struct snode_t *
node()
{
/*
Function returns the pointer to a struct snode_t which has had memory
allocated for it. This function is just for convenience. */
#if 0
struct snode_t *tmp = malloc(sizeof(struct snode_t *));
tmp->nxt = malloc(sizeof(struct snode_t *));
#endif
#if 0
struct snode_t *tmp = malloc(sizeof(struct snode_t));
tmp->nxt = malloc(sizeof(struct snode_t));
#endif
#if 1
struct snode_t *tmp = calloc(1,sizeof(*tmp));
#endif
return tmp;
}
char *
sstring(char *str, int s, int e, char *sub)
{
/*
Takes a string, a starting index and an end index, and allocates the
memory for a substring, creates a substring and returns it. Saves
programmer time fiddling with memory when getting a substring. */
size_t n = e - s + 2;
// n is the number of bytes needed to be allocated
#if 0
char *sub = malloc(sizeof(char) * n);
#endif
memcpy(sub, &str[s], n - 1);
sub[n] = 0;
return sub;
}
// append -- allocate a new node and append to tail of list
struct snode_t *
append(struct snode_t **head)
{
struct snode_t *cur;
struct snode_t *prev;
// find the tail of the list
prev = NULL;
for (cur = *head; cur != NULL; cur = cur->nxt)
prev = cur;
cur = node();
if (prev != NULL)
prev->nxt = cur;
else
*head = cur;
return cur;
}
struct snode_t *
psexp(char *str)
{
struct snode_t *head = NULL;
// head is the first item in the linked list.
// head is the struct that is returned once
// the function completes.
struct snode_t *aft = NULL;
// aft contains the current struct.
// once the type of a token is determined,
// aft is set to the "nxt" value of the current
// struct. aft is then assigned the according values.
int p = 1;
int s;
int e;
// indicates the position in the string. Starts
// at 1 because the first index of a properly formatted
// S-exp will always be a closing bracket.
// Note that this doesn't consider trailing whitespace,
// I'm only writing functions that do the bare minimum
// at this point so that I can make it more robust
// and safe later.
while (str[p] != 0) {
aft = append(&head);
switch (str[p]) {
case '(':
aft->val[0] = '(';
aft->val[1] = 0;
aft->typ = oparn;
p++;
break;
case ')':
aft->val[0] = ')';
aft->val[1] = 0;
aft->typ = cparn;
p++;
break;
case '"':
// if the current charachter is a
// ", all the following charachters
// until the next " comprise a string
// this next set of lines finds the
// position of the next " after the current one,
// and uses that number to produce a substring
// This substring is then stored in the current
// struct.
s = p;
p++;
while (str[p] != '"') {
p++;
}
e = p++;
sstring(str, s, e, aft->val);
aft->typ = strn;
}
#if 0
aft->nxt = node();
aft = aft->nxt;
#endif
// the current struct is shifted to the next one.
}
return head;
}
int
main()
{
psexp("(\"hello world!\")");
return 0;
}
更新:
Just as a side note - what does the #if 0 within some of the functions actually do? I've only ever really seen preprocessor directives used to make code more robust or customisable, ie. to check whether a header is defined and if not define it, or for user customisation before compilation. – Bithov Vin
我使用 #if 0
作为一种通过将预处理器 (cpp
) 获取到 include/elide 代码来“注释掉”代码的方法。它比将代码包装在 /*
和 */
对中更干净。我也用#if 1
来标记我写的“experimental/new”代码。
这里我用它[作为教学工具]来展示old/original代码与new/improved代码:
#if 0
// old code
#else
// new code
#endif
或:
#if 1
// new code
#endif
对于我自己的代码,当我确定代码正确时,我将删除 #if 0
并保留 #else
代码 [删除 #if/else/endif
行]留下 clean/simple/correct 代码。
所以,当你理解了差异后,你也可以去掉指令,只留下正确的代码。
请注意,我会使用 psexp
中的 #if 0
技巧来展示如何修复它,但是更改是如此广泛以至于使用预处理器指令会使事情变得如此 more难懂。
我正在编写一个简单的 Lisp 解释器,并且正在编写一个 S-exp 解析器。我决定逐步解析 S 表达式,即。通过迭代 字符串 S 表达式并使用字符线索(例如字符是否为 '(', ')'、'"'、''' 等)来确定当前标记的类型及其相关值。我正在解析使用单个 linked 列表将 S 表达式存入内存,也就是说,在迭代 S 表达式时,我的程序将获取令牌,将其解析为结构,然后 link 使用下一个标记。
这是我目前的代码:
#include <stdio.h>
#include <stddef.h>
#include <string.h>
#include <stdlib.h>
#define MAX_SNODE_LENGTH 256
enum styp {
/*
Enum that declares all possible types
that a token could be in my lisp S-expression.
In order, opening paranthesis, closing paranthesis,
string, integer, decimal (float), charachter, nil
*/
oparn,
cparn,
strn,
intg,
dec,
chac,
nil,
};
struct snode_t {
/*
When a S-expression is parsed every token is
parsed into this. All data is stored in a
string, but when operations need to be completed on it
they are converted based on their type (defined by enum
styp typ).
Each node points to the next one. When a S-exp is parsed,
an snode_t is returned that is the first item. THe programmer
can then iterate over the list and use the data stored in "val"
to interpret code, etc.
*/
char val[MAX_SNODE_LENGTH];
enum styp typ;
struct snode_t* nxt;
};
struct snode_t* node() {
/*
Function returns the pointer to a struct snode_t
which has had memory allocated for it. This function
is just for convenience.
*/
struct snode_t* tmp = malloc(sizeof(struct snode_t*));
tmp->nxt = malloc(sizeof(struct snode_t*));
return tmp;
}
char* sstring(char* str, int s, int e) {
/*
Takes a string, a starting index and an
end index, and allocates the memory for a substring,
creates a substring and returns it. Saves programmer
time fiddling with memory when getting a substring.
*/
size_t n = e - s + 2;
// n is the number of bytes needed to be allocated
char* sub = malloc(sizeof(char) * n);
memcpy(sub, &str[s], n-1);
sub[n] = '[=10=]';
return sub;
}
struct snode_t* psexp(char* str) {
struct snode_t* head = node();
// head is the first item in the linked list.
// head is the struct that is returned once
// the function completes.
struct snode_t* aft = node();
// aft contains the current struct.
// once the type of a token is determined,
// aft is set to the "nxt" value of the current
// struct. aft is then assigned the according values.
int p = 1;
// indicates the position in the string. Starts
// at 1 because the first index of a properly formatted
// S-exp will always be a closing bracket.
// Note that this doesn't consider trailing whitespace,
// I'm only writing functions that do the bare minimum
// at this point so that I can make it more robust
// and safe later.
if(str[0] == '(') {
head->val[0] = '(';
head->val[1] = '[=10=]';
head->typ = oparn;
aft = head->nxt;
}
while(str[p] != '[=10=]') {
if(str[p] == '(') {
head->val[0] = '(';
head->val[1] = '[=10=]';
aft->typ = oparn;
} else if(str[p] == ')') {
head->val[0] = ')';
head->val[1] = '[=10=]';
aft->typ = cparn;
} else if(str[p] == '"') {
// if the current charachter is a
// ", all the following charachters
// until the next " comprise a string
// this next set of lines finds the
// position of the next " after the current one,
// and uses that number to produce a substring
// This substring is then stored in the current
// struct.
int s = p;
int e;
p++;
while(str[p] != '"') {
p++;
}
e = p++;
char* tocp = sstring(str, s, e);
strcpy(aft->val, tocp);
aft->typ = strn;
}
aft->nxt = node();
aft = aft->nxt;
// the current struct is shifted to the next one.
}
return head;
}
int main() {
psexp("(\"hello world!\")");
return 0;
}
当我编译它时,我没有收到任何警告或错误,但是,当 运行 它时,我得到了一个段错误。我已经测试了 node 函数和 sstring 函数,所以我有理由相信 psexp 函数存在内存故障。
我不确定我分配内存的方式有什么问题,在 link 将一个结构转换为另一个结构之前,我确保分配了内存。是什么导致了我的段错误?
注意:目前 psexp 只关心括号和字符串,但是一旦我让字符串工作(我认为是导致段错误的区域),我打算添加整个 Enum 的数据类型.
这是一个错误:
struct snode_t* tmp = malloc(sizeof(struct snode_t*));
^
ups... pointer
所以当你真的想要一个“struct snode_t”时,你为“指向struct snode_t的指针”分配了内存。
更正为
struct snode_t* tmp = malloc(sizeof(struct snode_t));
或更好
struct snode_t* tmp = malloc(sizeof *tmp);
此外,这很奇怪:
tmp->nxt = malloc(sizeof(struct snode_t*));
创建新节点时你通常做的是:
tmp->nxt = NULL;
在链表中,你永远不会 malloc
指向下一个指针的任何东西。
在 node
中,您正在对节点 指针 的大小执行 malloc
而不是 [=15= 的完整大小].
修复后,程序无限循环
预分配 nxt
不好。 node
应该只分配一个新节点 而不是 两个。
在 sstring
中执行 malloc
会泄漏内存。最好将目标缓冲区作为参数传递。
对于 '('
和 ')'
的情况,p
从未增加,所以如果我们命中一个,我们将永远循环
链表代码需要修改。
这是重构后的版本:
#include <stdio.h>
#include <stddef.h>
#include <string.h>
#include <stdlib.h>
#define MAX_SNODE_LENGTH 256
enum styp {
/*
Enum that declares all possible types that a token could be in my
lisp S-expression. In order, opening paranthesis, closing paranthesis,
string, integer, decimal (float), charachter, nil */
oparn,
cparn,
strn,
intg,
dec,
chac,
nil,
};
struct snode_t {
/*
When a S-expression is parsed every token is parsed into this. All
data is stored in a string, but when operations need to be completed
on it they are converted based on their type (defined by enum styp
typ). Each node points to the next one. When a S-exp is parsed, an
snode_t is returned that is the first item. THe programmer can then
iterate over the list and use the data stored in "val" to interpret
code, etc. */
char val[MAX_SNODE_LENGTH];
enum styp typ;
struct snode_t *nxt;
};
struct snode_t *
node()
{
/*
Function returns the pointer to a struct snode_t which has had memory
allocated for it. This function is just for convenience. */
#if 0
struct snode_t *tmp = malloc(sizeof(struct snode_t *));
tmp->nxt = malloc(sizeof(struct snode_t *));
#endif
#if 0
struct snode_t *tmp = malloc(sizeof(struct snode_t));
tmp->nxt = malloc(sizeof(struct snode_t));
#endif
#if 1
struct snode_t *tmp = calloc(1,sizeof(*tmp));
#endif
return tmp;
}
char *
sstring(char *str, int s, int e, char *sub)
{
/*
Takes a string, a starting index and an end index, and allocates the
memory for a substring, creates a substring and returns it. Saves
programmer time fiddling with memory when getting a substring. */
size_t n = e - s + 2;
// n is the number of bytes needed to be allocated
#if 0
char *sub = malloc(sizeof(char) * n);
#endif
memcpy(sub, &str[s], n - 1);
sub[n] = 0;
return sub;
}
// append -- allocate a new node and append to tail of list
struct snode_t *
append(struct snode_t **head)
{
struct snode_t *cur;
struct snode_t *prev;
// find the tail of the list
prev = NULL;
for (cur = *head; cur != NULL; cur = cur->nxt)
prev = cur;
cur = node();
if (prev != NULL)
prev->nxt = cur;
else
*head = cur;
return cur;
}
struct snode_t *
psexp(char *str)
{
struct snode_t *head = NULL;
// head is the first item in the linked list.
// head is the struct that is returned once
// the function completes.
struct snode_t *aft = NULL;
// aft contains the current struct.
// once the type of a token is determined,
// aft is set to the "nxt" value of the current
// struct. aft is then assigned the according values.
int p = 1;
int s;
int e;
// indicates the position in the string. Starts
// at 1 because the first index of a properly formatted
// S-exp will always be a closing bracket.
// Note that this doesn't consider trailing whitespace,
// I'm only writing functions that do the bare minimum
// at this point so that I can make it more robust
// and safe later.
while (str[p] != 0) {
aft = append(&head);
switch (str[p]) {
case '(':
aft->val[0] = '(';
aft->val[1] = 0;
aft->typ = oparn;
p++;
break;
case ')':
aft->val[0] = ')';
aft->val[1] = 0;
aft->typ = cparn;
p++;
break;
case '"':
// if the current charachter is a
// ", all the following charachters
// until the next " comprise a string
// this next set of lines finds the
// position of the next " after the current one,
// and uses that number to produce a substring
// This substring is then stored in the current
// struct.
s = p;
p++;
while (str[p] != '"') {
p++;
}
e = p++;
sstring(str, s, e, aft->val);
aft->typ = strn;
}
#if 0
aft->nxt = node();
aft = aft->nxt;
#endif
// the current struct is shifted to the next one.
}
return head;
}
int
main()
{
psexp("(\"hello world!\")");
return 0;
}
更新:
Just as a side note - what does the #if 0 within some of the functions actually do? I've only ever really seen preprocessor directives used to make code more robust or customisable, ie. to check whether a header is defined and if not define it, or for user customisation before compilation. – Bithov Vin
我使用 #if 0
作为一种通过将预处理器 (cpp
) 获取到 include/elide 代码来“注释掉”代码的方法。它比将代码包装在 /*
和 */
对中更干净。我也用#if 1
来标记我写的“experimental/new”代码。
这里我用它[作为教学工具]来展示old/original代码与new/improved代码:
#if 0
// old code
#else
// new code
#endif
或:
#if 1
// new code
#endif
对于我自己的代码,当我确定代码正确时,我将删除 #if 0
并保留 #else
代码 [删除 #if/else/endif
行]留下 clean/simple/correct 代码。
所以,当你理解了差异后,你也可以去掉指令,只留下正确的代码。
请注意,我会使用 psexp
中的 #if 0
技巧来展示如何修复它,但是更改是如此广泛以至于使用预处理器指令会使事情变得如此 more难懂。