从字符串中提取子字符串时出现意外行为
Unexpected behaviour while extracting substrings from a string
我正在尝试打印大字符串中包含的所有子字符串,每个子字符串由'/'
字符分隔.我的功能没有像我预期的那样工作,但我不明白它有什么问题。这是我写的函数:
void print_serial_list(char *serial_list) {
char *iter = serial_list;
while (*iter != '[=10=]') { // Traverse the whole string
char *tmp_fn;
tmp_fn = strtok(iter,"/");
printf("Extracted entry: '%s'\n", tmp_fn);
iter = iter + sizeof(tmp_fn);
}
}
直接传递字符串
如果我运行这个函数是这样的:
char *string = "Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/";
printf("%s\n", string);
print_serial_list(string);
我遇到分段错误:
Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/
Segmentation fault (core dumped)
使用get_string()
函数
另一方面,如果我 运行 这个:
char *string = get_string();
printf("%s\n", string);
print_serial_list(string);
我得到以下输出(仍然错误):
Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/
Extracted entry: 'Lorem.ipsum'
Extracted entry: 'sum'
Extracted entry: 'r-sit-amet'
Extracted entry: 'et'
Extracted entry: 'ctetur'
Extracted entry: 'dipiscing.elit'
Extracted entry: 'g.elit'
Extracted entry: '�'
Extracted entry: 'x[�V'
Extracted entry: 'x[�V'
预期
明确一点,我希望输出在两种情况下都是:
Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/
Extracted entry: 'Lorem.ipsum'
Extracted entry: 'dolor-sit-amet'
Extracted entry: 'consectetur'
Extracted entry: 'adipiscing.elit'
(注意:我希望 get_string()
的代码不是理解问题所必需的......我想试试保持 post 不要太长)
编辑
根据评论中的一些建议,我以这种方式编辑了函数:
char *iter = serial_list;
bool first = true;
while (*iter != '[=16=]') { // Traverse the whole string
char *tmp_fn;
if (first)
tmp_fn = strtok(iter, "/");
else
tmp_fn = strtok(NULL, "/");
size_t tmp_size = strlen(tmp_fn);
printf("Extracted entry: '%s' - size = %zu\n", tmp_fn, tmp_size);
iter = iter + tmp_size;
first = false;
}
我得到的输出仍然有一些问题,但与我想要的更相似!
Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/
Extracted entry: 'Lorem.ipsum' - size = 11
If I run this function like this, I get a segmentation fault:
char *string = "Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/";
您的程序具有 undefined behavior,因为它正在尝试修改字符串文字,因为您将字符串文字传递给 strtok()
。
char * strtok ( char * str, const char * delimiters );
Split string into tokens
A sequence of calls to this function split str into tokens, which are sequences of contiguous characters separated by any of the characters that are part of delimiters.
string
是一个指向字符串字面量的指针,其内容不可修改。并试图通过指针修改它们是未定义的行为。
要解决此问题,您可以简单地执行以下操作:
char string[] = "Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/";
^^
在print_serial_list()
函数中,你可以做:
void print_serial_list(char *serial_list) {
char *iter = serial_list;
if (serial_list == NULL)
return;
char *tmp_fn = strtok(iter, "/");
while (tmp_fn != NULL)
{
printf ("Extracted entry: '%s'\n", tmp_fn);
tmp_fn = strtok(NULL, "/");
}
}
print_serial_list()
输出是(对于 Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/
输入字符串):
Extracted entry: 'Lorem.ipsum'
Extracted entry: 'dolor-sit-amet'
Extracted entry: 'consectetur'
Extracted entry: 'adipiscing.elit'
这里要注意的一点是 print_serial_list()
将在将字符串 string
传递给 strtok()
时对其进行修改。如果您不想在调用 print_serial_list()
函数后修改输入字符串,请在 print_serial_list()
函数中复制它。
来自 strtok:
Bugs
Be cautious when using these functions. If you do use them, note that:
*These functions modify their first argument.
*These functions cannot be used on constant strings.
*The identity of the delimiting byte is lost.
感谢@David C.Rankin 在评论中分享这些 strtok()
错误。
我正在尝试打印大字符串中包含的所有子字符串,每个子字符串由'/'
字符分隔.我的功能没有像我预期的那样工作,但我不明白它有什么问题。这是我写的函数:
void print_serial_list(char *serial_list) {
char *iter = serial_list;
while (*iter != '[=10=]') { // Traverse the whole string
char *tmp_fn;
tmp_fn = strtok(iter,"/");
printf("Extracted entry: '%s'\n", tmp_fn);
iter = iter + sizeof(tmp_fn);
}
}
直接传递字符串
如果我运行这个函数是这样的:
char *string = "Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/";
printf("%s\n", string);
print_serial_list(string);
我遇到分段错误:
Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/
Segmentation fault (core dumped)
使用get_string()
函数
另一方面,如果我 运行 这个:
char *string = get_string();
printf("%s\n", string);
print_serial_list(string);
我得到以下输出(仍然错误):
Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/
Extracted entry: 'Lorem.ipsum'
Extracted entry: 'sum'
Extracted entry: 'r-sit-amet'
Extracted entry: 'et'
Extracted entry: 'ctetur'
Extracted entry: 'dipiscing.elit'
Extracted entry: 'g.elit'
Extracted entry: '�'
Extracted entry: 'x[�V'
Extracted entry: 'x[�V'
预期
明确一点,我希望输出在两种情况下都是:
Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/
Extracted entry: 'Lorem.ipsum'
Extracted entry: 'dolor-sit-amet'
Extracted entry: 'consectetur'
Extracted entry: 'adipiscing.elit'
(注意:我希望 get_string()
的代码不是理解问题所必需的......我想试试保持 post 不要太长)
编辑
根据评论中的一些建议,我以这种方式编辑了函数:
char *iter = serial_list;
bool first = true;
while (*iter != '[=16=]') { // Traverse the whole string
char *tmp_fn;
if (first)
tmp_fn = strtok(iter, "/");
else
tmp_fn = strtok(NULL, "/");
size_t tmp_size = strlen(tmp_fn);
printf("Extracted entry: '%s' - size = %zu\n", tmp_fn, tmp_size);
iter = iter + tmp_size;
first = false;
}
我得到的输出仍然有一些问题,但与我想要的更相似!
Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/
Extracted entry: 'Lorem.ipsum' - size = 11
If I run this function like this, I get a segmentation fault:
char *string = "Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/";
您的程序具有 undefined behavior,因为它正在尝试修改字符串文字,因为您将字符串文字传递给 strtok()
。
char * strtok ( char * str, const char * delimiters );
Split string into tokens
A sequence of calls to this function split str into tokens, which are sequences of contiguous characters separated by any of the characters that are part of delimiters.
string
是一个指向字符串字面量的指针,其内容不可修改。并试图通过指针修改它们是未定义的行为。
要解决此问题,您可以简单地执行以下操作:
char string[] = "Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/";
^^
在print_serial_list()
函数中,你可以做:
void print_serial_list(char *serial_list) {
char *iter = serial_list;
if (serial_list == NULL)
return;
char *tmp_fn = strtok(iter, "/");
while (tmp_fn != NULL)
{
printf ("Extracted entry: '%s'\n", tmp_fn);
tmp_fn = strtok(NULL, "/");
}
}
print_serial_list()
输出是(对于 Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/
输入字符串):
Extracted entry: 'Lorem.ipsum'
Extracted entry: 'dolor-sit-amet'
Extracted entry: 'consectetur'
Extracted entry: 'adipiscing.elit'
这里要注意的一点是 print_serial_list()
将在将字符串 string
传递给 strtok()
时对其进行修改。如果您不想在调用 print_serial_list()
函数后修改输入字符串,请在 print_serial_list()
函数中复制它。
来自 strtok:
Bugs Be cautious when using these functions. If you do use them, note that:
*These functions modify their first argument.
*These functions cannot be used on constant strings.
*The identity of the delimiting byte is lost.
感谢@David C.Rankin 在评论中分享这些 strtok()
错误。