从字符串中提取子字符串时出现意外行为

Unexpected behaviour while extracting substrings from a string

我正在尝试打印大字符串中包含的所有子字符串,每个子字符串'/'字符分隔.我的功能没有像我预期的那样工作,但我不明白它有什么问题。这是我写的函数:

void print_serial_list(char *serial_list) {
    char *iter = serial_list;
    while (*iter != '[=10=]') { // Traverse the whole string
        char *tmp_fn;
        tmp_fn = strtok(iter,"/");
        printf("Extracted entry: '%s'\n", tmp_fn);
        iter = iter + sizeof(tmp_fn);
    }
}

直接传递字符串

如果我运行这个函数是这样的:

char *string = "Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/";
printf("%s\n", string);
print_serial_list(string);

我遇到分段错误:

Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/
Segmentation fault (core dumped)

使用get_string()函数

另一方面,如果我 运行 这个:

char *string = get_string();
printf("%s\n", string);
print_serial_list(string);

我得到以下输出(仍然错误):

Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/
Extracted entry: 'Lorem.ipsum'
Extracted entry: 'sum'
Extracted entry: 'r-sit-amet'
Extracted entry: 'et'
Extracted entry: 'ctetur'
Extracted entry: 'dipiscing.elit'
Extracted entry: 'g.elit'
Extracted entry: '�'
Extracted entry: 'x[�V'
Extracted entry: 'x[�V'

预期

明确一点,我希望输出在两种情况下都是:

Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/
Extracted entry: 'Lorem.ipsum'
Extracted entry: 'dolor-sit-amet'
Extracted entry: 'consectetur'
Extracted entry: 'adipiscing.elit'

(注意:我希望 get_string() 的代码不是理解问题所必需的......我想试试保持 post 不要太长)

编辑

根据评论中的一些建议,我以这种方式编辑了函数:

char *iter = serial_list;
bool first = true;
while (*iter != '[=16=]') { // Traverse the whole string
    char *tmp_fn;
    if (first)
        tmp_fn = strtok(iter, "/");
    else
        tmp_fn = strtok(NULL, "/");
    size_t tmp_size = strlen(tmp_fn);
    printf("Extracted entry: '%s' - size = %zu\n", tmp_fn, tmp_size);
    iter = iter + tmp_size;
    first = false;
}

我得到的输出仍然有一些问题,但与我想要的更相似!

Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/
Extracted entry: 'Lorem.ipsum' - size = 11

If I run this function like this, I get a segmentation fault:

char *string = "Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/";

您的程序具有 undefined behavior,因为它正在尝试修改字符串文字,因为您将字符串文字传递给 strtok()

char * strtok ( char * str, const char * delimiters );

Split string into tokens

A sequence of calls to this function split str into tokens, which are sequences of contiguous characters separated by any of the characters that are part of delimiters.

string是一个指向字符串字面量的指针,其内容不可修改。并试图通过指针修改它们是未定义的行为。

要解决此问题,您可以简单地执行以下操作:

char string[] = "Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/";
           ^^

print_serial_list()函数中,你可以做:

void print_serial_list(char *serial_list) {
    char *iter = serial_list;

    if (serial_list == NULL)
            return;

    char *tmp_fn = strtok(iter, "/");
    while (tmp_fn != NULL)
    {
            printf ("Extracted entry: '%s'\n", tmp_fn);
            tmp_fn = strtok(NULL, "/");
    }
}

print_serial_list() 输出是(对于 Lorem.ipsum/dolor-sit-amet/consectetur/adipiscing.elit/ 输入字符串):

Extracted entry: 'Lorem.ipsum'
Extracted entry: 'dolor-sit-amet'
Extracted entry: 'consectetur'
Extracted entry: 'adipiscing.elit'

这里要注意的一点是 print_serial_list() 将在将字符串 string 传递给 strtok() 时对其进行修改。如果您不想在调用 print_serial_list() 函数后修改输入字符串,请在 print_serial_list() 函数中复制它。


来自 strtok:

Bugs Be cautious when using these functions. If you do use them, note that:

*These functions modify their first argument.

*These functions cannot be used on constant strings.

*The identity of the delimiting byte is lost.

感谢@David C.Rankin 在评论中分享这些 strtok() 错误。