从 C 中的字符串中删除所有非字母字符——可能是编译器问题

Remove all non alphabet characters from a string in C -- possible compiler issue

我正在用 C 语言编写一个函数,它将接受一个字符串并删除所有非小写字母字符的字符。到目前为止我已经写了这段代码:

void strclean(char* str) {
   while (*str) {
      if (!(*str >= 'a' && *str <= 'z')) {
         strcpy(str, str + 1);
         str--;
      }
      str++;
   }
}

当我向它传递字符串 "hello[][]world" 时,该函数似乎大部分工作正常,除了输出是:

hellowoldd

当我让它在输入 if 语句的每一行后打印时,这是我收到的输出:

hello][]woldd
hello[]woldd
hello]woldd
hellowoldd

看起来真的很接近,但我不明白为什么会产生这个输出!最奇怪的是我已经把代码给了另外两个朋友,并且在他们的电脑上运行良好。我们都是运行同一个版本的Linux(ubuntu14.04.3),都是用gcc编译的

我不确定代码是否存在问题会导致输出不一致,或者是否是编译器问题导致了问题。与他们的相比,这可能与我机器上的 strcpy 有关?

如果范围重叠,strcpy 函数不能保证正常工作,就像您的情况一样。来自C11 7.24.2.3 The strcpy function /2(我的重点):

The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.

您可以使用 memmove 之类的东西, 可以处理重叠范围,根据 C11 7.24.2.2 The memmove function /2:

The memmove function copies n characters from the object pointed to by s2 into the object pointed to by s1. Copying takes place as if the n characters from the object pointed to by s2 are first copied into a temporary array of n characters that does not overlap the objects pointed to by s1 and s2, and then the n characters from the temporary array are copied into the object pointed to by s1.


但是有一个更好的解决方案是 O(n) 而不是 O(n<sup>2</sup>) 在时间复杂度上,同时仍然是重叠安全:

void strclean (char* src) {
    // Run two pointers in parallel.

    char *dst = src;

    // Process every source character.

    while (*src) {
        // Only copy (and update destination pointer) if suitable.
        // Update source pointer always.

        if (islower(*src)) *dst++ = *src;
        src++;
    }

    // Finalise destination string.

    *dst = '[=10=]';
}

您会注意到我还使用 islower()(来自 ctype.h)来检测小写字母字符。这更便于移植,因为 C 标准不强制要求字母字符具有连续的代码点(数字是唯一保证连续的数字)。

也不需要单独检查 isalpha(),因为根据 C11 7.4.1.2 The isalpha function /2islower() == true 意味着 isalpha() == true:

The isalpha function tests for any character for which isupper or islower is true, or ...

来自N1256 7.21.2.3 strcpy函数

If copying takes place between objects that overlap, the behavior is undefined.

memmove即使区域重叠也可以使用

void strclean(char* str) {
   while (*str) {
      if (!islower(*str)) { /* include ctype.h to use islower function */
         memmove(str, str + 1, strlen(str)); /* strlen(str + 1) + 1 (for terminating null character) should be strlen(str) */
      } else {
         str++;
      }
   }
}

由于从指针中减去指向数组之前的区域是未定义的行为,我还重构了str操作。