C 中的 Anagram 测试器

Anagram tester in C

我正在尝试用 C 语言实现一个变位词测试器。在调用该程序时,用户在双引号中输入两个单词,例如 "listen" 和 "silent"。 我几乎已经开始工作了,但是我在使用我编写的用于消除两个输入单词中的空格的辅助函数时遇到了一些麻烦。这是此函数的代码:

void noSpaces(char word[100]) {
    /*
    This is a function to get rid of spaces in a word
    It does this by scanning for a space and shifting the
    array elements at indices > where the space is
    down by 1 as long as there is still a space
    there. 
    */
    for (int i = 0; i < 100; i++) {
        while (word[i] == ' ') {
            for (int j = i; j < 100; j++) {
                word[j] = word[j+1];
            }
        }
    }
}

现在,当我将来自 main 函数的输入词传递给这个助手时,它工作正常。问题是第二次调用这个函数。当我在第二个输入上调用此函数时,如果 k 是第一个输入中的空格数,则该函数将擦除第二个输入的前 k 个字母。例如,键入 ./anagram " banana" "banana" 会给我一个假阴性,如果我添加一个打印语句以查看 noSpaces 之后的输入发生了什么 被调用,我得到以下信息:

banana
anana

完整程序的代码如下:

#include <stdio.h>

int main(int argc, char *argv[]) {
    //this if statement checks for empty entry
    if (isEmpty(argv[1]) == 0 || isEmpty(argv[2]) == 0) {
        //puts("one of these strings is empty");
        return 1;
    }
    //call to noSpaces to eliminate spaces in each word
    noSpaces(argv[1]);
    noSpaces(argv[2]);
    //call to sortWords
    sortWords(argv[1]);
    sortWords(argv[2]);
    int result = compare(argv[1], argv[2]);
    /*
    if (result == 1) {
        puts("Not anagrams");
    } else {
        puts("Anagrams");
    }
    */
    return result;
}

int compare(char word1[100], char word2[100]) {
    /*
    This is a function that accepts two sorted 
    char arrays (see 'sortWords' below) and
    returns 1 if it finds a different character
    at entry i in either array, or 0 if at no 
    index the arrays have a different character.
    */
    int counter = 0;
    while (word1[counter] != '[=12=]' && word2[counter] != '[=12=]') {
        if (word1[counter] != word2[counter]) {
            //printf("not anagrams\n");
            return 1;
        }
        counter++;
    }
    // printf("anagrams\n");
    return 0;
}

void sortWords(char word[100]) {
    /*
    This is a function to sort the input char arrays
    it's a simple bubble sort on the array elements.
    'sortWords' function accepts a char array and returns void,
    sorting the entries in alphabetical order
    being careful about ignoring the 'special character'
    '[=12=]'.
    */
    for (int j = 0; j < 100; j++) {
        int i = 0;
        while (word[i + 1] != '[=12=]') {
            if (word[i] > word[i + 1]) {
                char dummy = word[i + 1];
                word[i + 1] = word[i];
                word[i] = dummy;
            }
            i++;
        }
    }
}

void noSpaces(char word[100]) {
    /*
    This is a function to get rid of spaces in a word
    It does this by scanning for a space and shifting the
    array elements at indices > where the space is
    down by 1 as long as there is still a space there. 
    */
    for (int i = 0; i < 100; i++) {
        while (word[i] == ' ') {
            for (int j = i; j < 100; j++) {
                word[j] = word[j + 1];
            }
        }
    }
}

int isEmpty(char word[100]) {
    // if a word consists of the empty character, it's empty
    //otherwise, it isn't
    if (word[0] == '[=12=]') {
        return 0;
    }
    return 1;
}

我知道有一个库可以处理字符串,但我真的 想避免使用它。我已经走到这一步而不需要它了,我觉得问题基本上已经解决了,但对于一件我看不到的小事。

我来自 java 背景,我是 C 的新手,如果这能解释我犯的任何错误。

您的辅助函数中存在逻辑错误。您从 word[j] 开始复制,而不是从第二个单词的开头开始,因此您将删除与前导空格一样多的前导字符,就像您在输出中看到的那样。

请注意,j=ii 计算外循环中前导空格的数量。

顺便说一句,你应该只有两个循环。将 while 条件放在第一个 for 循环中,如下所示:for (int i = 0; i<100 && word[i]==' '; i++).

要修复您的逻辑错误,您需要在最内层循环中使用另一个初始化为零的迭代器k,并使用word[k] = word[j+1]。我认为这会起作用。

在 argv[1] 缓冲区长度小于 100 的情况下,argv[1] 和 argv[2] 上的缓冲区溢出有问题。所以我认为你应该使用 for 循环和 strlen(word) 即足够。当您在 for 循环中使用静态长度为 100 时,有时该词会从另一个内存位置获取数据并使您的程序处于未定义的行为中。其他功能也有同样的问题。我的意思是 sortWordscompare 函数。

这是我对您的 noSpaces 函数所做的修改,它应该可以工作。

void noSpaces(char word [100]){
    /*
    This is a function to get rid of spaces in a word
    It does this by scanning for a space and shifting the
    array elements at indices > where the space is
    down by 1 as long as there is still a space
    there.
    */
    for(int i =0; i<strlen(word)-1; i++){
        while(word[i]==' '){
            for(int j = i ; j<strlen(word); j++){
                word[j] = word [j+1];
            }
        }
    }
}

而不是尝试删除空格和排序,这是 O(N lg N) 运行 时间。您可以通过构建一个表示单词中每个字母的计数的数组来执行 O(N) 操作。并且在执行此操作时忽略空格。

// Iterate over each character in the string
// For each char in string, increment the count of that character
// in the lettercount array.
// Return the number of unfiltered letters that were counted
int fillLetterCountTable(const char* string, int* lettercount)
{
    int len = strlen(string);
    int valid = 0;

    for (int i = 0; i < len; i++)
    {
       unsigned char index = (unsigned char)(string1[i]);
       if (index ==  ' ')  // ignore spaces
       {
           continue;
       }
       counts[index] += 1;
       valid++;
    }

    return valid;
}

// compare if two strings are anagrams of each other
// return true if string1 and string2 are anagrams, false otherwise
bool compare(const char* string1, const char* string2)
{
    int lettercount1[256] = {0};
    int lettercount2[256] = {0};

    int valid1 = fillLetterCountTable(string1, lettercount1);
    int valid2 = fillLetterCountTable(string2, lettercount2);

    if (valid1 != valid2)
        return false;

    // memcmp(lettercount1, lettercount2, sizeof(lettercount1));
    for (int i = 0; i < 256; i++)
    {
        if (counts1[i] != counts2[i])
            return false;
    }
    return true;
}

在C 语言中,字符串是char 的数组,带有空终止符,即值为0 的字节通常表示为'[=16=]'。您不应假定任何特定长度,例如 100。实际上,函数原型参数中的数组大小会被编译器忽略。您可以通过扫描空终止符来确定长度,这是 strlen() 有效的做法,或者您可以编写代码以避免多次扫描,在空终止符处停止。您应该确保您的函数适用于空字符串,这是一个具有单个空字节的数组。以下是您的代码中的问题:

在函数 noSpaces 中,您迭代到字符串末尾之外,修改可能属于下一个字符串的内存。该程序有未定义的行为。

你应该在字符串的末尾停止。还使用 2 个索引变量以线性时间执行:

void noSpaces(char word[]) {
    /*
    This is a function to get rid of spaces in a word
    It does this by scanning for a space and shifting the
    array elements at indices > where the space is
    down by 1 as long as there is still a space
    there. 
    */
    int i, j;
    for (i = j = 0; word[i] != '[=10=]'; i++) {
        if (word[i] != ' ') {
            word[j++] = word[i];
        }
    }
    word[j] = '[=10=]';
}

您可以简化 compare 以平均使用三分之一的测试:

int compare(const char word1[], const char word2[]) {
    /*
    This is a function that accepts two sorted 
    char arrays (see 'sortWords' below) and
    returns 1 if it finds a different character
    at entry i in either array, or 0 if at no 
    index the arrays have a different character.
    */
    for (int i = 0; word1[i] == word2[i]; i++) {
        if (word1[i]) == '[=11=]')
            //printf("anagrams\n");
            return 0;
        }
    }
    // printf("not anagrams\n");
    return 1;
}

sortWords 对于空字符串有未定义的行为,因为您在索引 1 处读取了 char,超出了数组的末尾。这是更正后的版本:

void sortWords(char word[]) {
    /*
    This is a function to sort the input char arrays
    it's a simple bubble sort on the array elements.
    'sortWords' function accepts a char array and returns void,
    sorting the entries in alphabetical order
    being careful about ignoring the 'special character'
    '[=12=]'.
    */
    for (int j = 0; word[j] != '[=12=]'; j++) {
        for (int i = 1; i < j; i++) {
            if (word[i - 1] > word[i]) {
                char dummy = word[i - 1];
                word[i - 1] = word[i];
                word[i] = dummy;
            }
        }
    }
}

您应该在使用前声明函数,或者在使用前定义它们。您的代码可以编译,因为编译器接受旧式 C,其中尚未见过的函数的原型是从第一个调用站点传递的参数推断出来的。这种做法容易出错且已过时。

您的排序函数具有二次时间复杂度,对于很长的字符串可能会非常慢,但单词不应太大,所以这不是问题。

最好不要修改参数字符串。您可以使用具有相同时间复杂度的字符串之一的副本来执行测试。

这里有一个直接的方法:

#include <stdio.h>

int check_anagrams(const char word1[], const char word2[]) {
    /*
       This function accepts two strings and returns 1 if they
       are anagrams of one another, ignoring spaces.
       The strings are not modified.
     */
    int i, j, len1, letters1, letters2;

    /* compute the length and number of letters of word1 */
    for (len1 = letters1 = 0; word1[len1] != '[=13=]'; len1++) {
        if (word1[len1] != ' ')
            letters1++;
    }

    /* create a copy of word1 in automatic storage */
    char copy[len1];    /* this is an array, not a string */
    for (i = 0; i < len1; i++)
        copy[i] = word1[i];

    for (j = letters2 = 0; word2[j] != '[=13=]'; j++) {
        char temp = word2[j];
        if (temp != ' ') {
            letters2++;
            for (i = 0; i < len1; i++) {
                if (copy[i] == temp) {
                    copy[i] = '[=13=]';
                    break;
                }
            }
            if (i == len1) {
                /* letter was not found */
                return 0;
            }
        }
    }
    if (letters1 != letters2)
        return 0;
    return 1;
}

int main(int argc, char *argv[]) {
    const char *s1 = " listen";
    const char *s2 = "silent   ";
    if (argc >= 3) {
        s1 = argv[1];
        s2 = argv[2];
    }
    int result = check_anagrams(s1, s2);
    if (result == 0) {
        printf("\"%s\" and \"%s\" are not anagrams\n", s1, s2);
    } else {
        printf("\"%s\" and \"%s\" are anagrams\n", s1, s2);
    }
    return result;
}