使用 strtok() 后如何恢复字符串

Question

我有一个项目，我需要根据每行中的第二个、第三个等单词而不是第一个单词对多行文本进行排序。例如，

this line is first

but this line is second

finally there is this line

然后你选择按第二个词排序，它会变成

this line is first

finally there is this line

but this line is second

（因为线在这之前有在这之前）

我有一个指向包含每一行的字符数组的指针。到目前为止，我所做的是使用 strtok() 将每一行拆分为第二个单词，但这会将整个字符串更改为该单词并将其存储在我的数组中。我的标记化位代码如下所示：

 for (i = 0; i < numLines; i++) {
   char* token = strtok(labels[i], " ");
   token = strtok(NULL, " ");
   labels[i] = token;
 }

这会给我每行中的第二个词，因为我调用了 strtok 两次。然后我对这些词进行排序。 (line, this, there) 但是，我需要将字符串以其原始形式放回原处。我知道 strtok 会将标记转换为“\0”，但我还没有找到恢复原始字符串的方法。

我确定答案在于使用指针，但我很困惑接下来我到底需要做什么。

我应该提到我正在读取输入文件中的行，如下所示：

for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
  labels[i] = strdup(buffer);

编辑：我的 find_offset 方法

size_t find_offset(const char *s, int n) {
  size_t len;
  while (n > 0) {
     len = strspn(s, " ");
     s += len;
  }

  return len;
}

编辑2：用于排序的相关代码

//Getting the line and offset
for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
   labels[i].line = strdup(buffer);
   labels[i].offset = find_offset(labels[i].line, nth);
}


int n = sizeof(labels) / sizeof(labels[0]);
qsort(labels, n, sizeof(*labels), myCompare);
for (i = 0; i < numLines; i++)
  printf("%d: %s", i, labels[i].line); //Print the sorted lines


int myCompare(const void* a, const void* b) { //Compare function
  xline *xlineA = (xline *)a;
  xline *xlineB = (xline *)b;

  return strcmp(xlineA->line + xlineA->offset, xlineB->line + xlineB->offset);
}

Answer 1

也许与其乱用 strtok()，不如使用 strspn(), strcspn() 来解析字符串中的标记。那么原始字符串甚至可以是const.

#include <stdio.h>
#include <string.h>

int main(void) {
  const char str[] = "this line is first";
  const char *s = str;
  while (*(s += strspn(s, " ")) != '[=10=]') {
    size_t len = strcspn(s, " ");

    // Instead of printing, use the nth parsed token for key sorting
    printf("<%.*s>\n", (int) len, s);

    s += len;
  }
}

输出

<this>
<line>
<is>
<first>

或

不排序行。

排序结构

typedef struct {
  char *line;
  size_t offset;
} xline;

伪代码

int fcmp(a, b) {
  return strcmp(a->line + a->offset, b->line + b->offset);
}

size_t find_offset_of_nth_word(const char *s, n) {
  while (n > 0) {
    use strspn(), strcspn() like above
  }
}

main() {
  int nth = ...;
  xline labels[numLines];
  for (i = 0; i < numLines && fgets(buffer, sizeof(buffer), fp) != 0; i++) {
     labels[i].line = strdup(buffer);
     labels[i].offset = find_offset_of_nth_word(nth);
  }

  qsort(labels, i, sizeof *labels, fcmp);

}

或

阅读每一行后，找到带有 strspn(), strcspn() 的 nth 标记，并将行从 "aaa bbb ccc ddd \n" 改成 "ccd ddd \naaa bbb "，排序，然后重新排序行。

在任何情况下，都不要使用 strtok() - 丢失的信息太多。

Answer 2

I need to put the string back together in it's original form. I'm aware that strtok turns the tokens into '[=11=]', but Ive yet to find a way to get the original string back.

Far 如果您想保留原始字符串，最好首先避免损坏它们，尤其是避免丢失指向它们的指针。如果可以安全地假设每行中至少有三个单词，并且第二个单词与第一个单词和第三个单词的每一侧恰好隔开一个 space，您可以撤消 strtok()'s用字符串终止符替换定界符。但是，一旦丢失，就没有安全或可靠的方法来恢复整个字符串的开头。

我建议创建一个辅助数组，在其中记录每个句子的第二个单词的信息——在不损坏原始句子的情况下获得——然后将辅助数组和句子数组进行共同排序。要记录在 aux 数组中的信息可以是句子的第二个单词的副本，它们的偏移量和长度，或类似的东西。

使用 strtok() 后如何恢复字符串

How to restore string after using strtok()

c

sorting

strtok