二进制搜索单词字典

Question

在此代码中：

我读取文件的内容~/usr/share/dict/word并将它们存储在数组中。
然后开始对该数组进行二分查找算法，但问题是在第 62 行将数组传递给二分查找函数并尝试将其与 binary_search(string* dictionary, string key) 方法中的键进行比较。
我发现它正在将 key 与这个未知字符串 "��tudes" 进行比较，原因我不知道。
不过我确信该数组包含正确的数据。

代码：

#include <stdio.h>
#include <cs50.h>
#include <string.h>

#define MAX 99171

// Prototype //
int binary_search(string*, string);

int main(int argc, string argv[])
{
    // Attributes // 
    string dictionary[MAX];
    FILE* dictionaryFile = fopen("words", "r");
    char output[256];
    string key = argv[1];

    // Check if their is a problem while reading the file //
    if (dictionaryFile == NULL)
    {
        // If everything got fouled up then close the file // 
        fclose(dictionaryFile);
        printf("couldn't read the file!!!\n");
        return 1;
    }

    // storing the information into an array to make it easy to read //
    for(int i = 0; i < MAX; i++)
    { 
        fgets(output, sizeof(output), dictionaryFile); 
        dictionary[i] = output;
    }

    // Binary Search a word //
    if(binary_search(dictionary, key) == 1)
    {
        printf("word was found !!!\n");
    }
    else if(binary_search == 0)
    {
        printf("word was not found !!!\n");
    }

    // If Everything goes just fine close the file //
    fclose(dictionaryFile);
    return 0;
}


// implementing prototype //

/**
    @arag dictionary 
        a string of english words 

    @arg key 
        a key we looking for

    @return 
        0 if didn't find the key and 1 otherwise
*/
int binary_search(string* dictionary, string key)
{
    // pointer to the start and the end of the array //
    int start = 0;
    int end = MAX - 1;
    int mid;

    // while end is greater than the start //
    while (end > start)
    {
        // Get The Middle Element //
        mid = (start + end) / 2;
        printf("%s\n", dictionary[mid]);

        // Check if the middle elemenet //
        if (strcmp(key, dictionary[mid]) == 0)
        {
            return 1;
        }

        // Check the left half //
        else if(strcmp(key, dictionary[mid]) < 0)
        {
            end = mid - 1;
        }

        // Check the right half //
        else if (strcmp(key, dictionary[mid]) > 0)
        {
            start = mid + 1;
        }
    }
    // didn't find the key //
    return 0;

}

注意：cs50.h 库是哈佛大学为像我这样的初学者制作的训练轮，我在我的代码中使用它，这是 link 到它的 reference .

Answer 1

The cs50.h library is made by Harvard as a training wheel for beginners.

如果是这样，这些训练轮是倒置安装的并且不接触地面。我无法从你的 link 看出，但我认为

typedef char *string;

是 cs50 套件的一部分。但是 C 中没有字符串；该表达式泛指以空字符结尾的字符数组，'[=19=]'.

上面string的定义让你相信string是一个合适的类型，它的内存是自动处理的。它不是。在你的程序中有一个字符串的位置，即 array

char output[256];

你字典里的"strings"只是指针；它们应该指向现有的 char 数组或者指向 NULL。通过赋值

dictionary[i] = output;

您使字典中的所有字符串都等于临时缓冲区 output。该缓冲区在您阅读的每一行中都会被覆盖，并且只会包含您阅读的最后一行，可能 "zulu".

您可以在阅读字典后打印出来确认这一点。你应该在一个单独的循环中打印它，而不是在你阅读它以查看效果的同一个循环中。

您可以通过将指针数组声明为 char 数组来解决此问题：

char dictionary[MAX][LEN];

其中 LEN 是单词的合适最大长度，比如 24。（这里的问题可能是分配的内存，MAX * LEN 字节可能不适合堆栈。在那种情况下，您必须使用 malloc 在堆上分配内存。我不会在这里打开那个蠕虫罐头。如果您立即遇到分段冲突，请尝试减少 MAX，但代价是只读取一个字典的一部分。)

看字时一定要复制内容：

fgets(output, sizeof(output), dictionaryFile); 
strncpy(dictionary[i], output, sizeof(dictionary[i]);

或者，更好的是，直接将下一个单词读入字典：

fgets(dictionary[i], sizeof(dictionary[i]), dictionaryFile);

不幸的是，fgets 保留了末尾的换行符，因此它显示为 "word\n" 而不是 "word"。您必须删除换行符，否则字符串将与输入不匹配，输入来自命令行 argv，没有尾随换行符。

有几种方法可以去掉不需要的换行符。一个简单的方法是用换行符作为分隔符来标记字符串：

strtok(dictionary[i], "\n");

另一个问题是 dictionary 的新定义，您对 binary_search 的签名是错误的。您不再有一个指向 char 的指针数组，您有一个 24 个（大约是固定数字）字符的数组。将其更改为：

int binary_search(char dictionary[][LEN], const char *key)

在 C 中，如果您有数组的数组（甚至是数组的数组），除了最顶层的维度之外的所有维度都必须知道，以便编译器可以布置内存。

还有其他（相当小的）问题：

如果文件无法打开，您尝试fclose文件。 Bu 当文件为 NULL 时，您没有要关闭的打开文件；退出。
你应该强制至少有一个参数，否则你可能会循环一个空键，当你尝试比较它时，这将导致未定义的行为（即很可能崩溃）。
读字时，不要依赖 hard-coded 字数。您不知道文件中有多少字。检查fgets的return值；它 returns NULL 当文件有运行时。 MAX 是估计字数的好方法，但您应该将实际读取的字数保存在一个变量中。确保您访问的字数不超过您阅读的字数，并确保您写的字数不超过您分配的内存，即不读超过 MAX 个字数。
如果您没有 hard-coded 字数统计，您应该将其作为 binary_search 函数的参数。
在"not found" beanch中，你的测试是else if(binary_search == 0)。首先，else aleady 意味着二进制搜索没有 return 1（这是 else 所指的条件）并且二进制搜索可以 return 只有 0 和 1 ，所以不需要另一个条件。第二，binary_search只是函数的地址，不是结果；上面写的条件永远是真的。
二进制搜索函数中的 strcmp 调用也是如此：您进行三个比较。您检查的结果是互斥的，因此最后一个条件可以只是 else。（因为 strcmp 每次都进行 character-by-character 比较，所以可能值得每个单词调用一次 strcmp 并存储结果。）

cs50 header 中的 string 数据类型旨在提供一种无需关心内存即可读取字符串的简单方法。一旦开始创建更复杂的（又名 real-life）数据结构，最好使用 char 数组和指针。无论如何都没有办法解决这个问题，你会看到每条数据是什么。

很抱歉，我的回答看起来像是一堆错误清单。 C 的字符串处理对于初学者来说不是很容易，特别是如果您已经使用过 higher-level 语言。好处是，当你了解 C 字符串时，你已经知道很多关于 C 中一般情况下是如何完成的事情。

二进制搜索单词字典

Binary searching a dictionary of words

c

arrays

string

binary-search

cs50