SuperFastHash returns 相同字符串的不同哈希值，但前提是由不同的函数调用确定

Question

所以，我的 SFH 函数：

/*  
 * Hash function (found at: 'http://www.azillionmonkeys.com/qed/hash.html')  
 */ 
int32_t SuperFastHash(const char * data, int len)  {
    uint32_t hash = len, tmp;
    int rem;

    if (len <= 0 || data == NULL) return 0;

    rem = len & 3;
    len >>= 2;

    /* Main loop */
    for (;len > 0; len--) {
        hash  += get16bits (data);
        tmp    = (get16bits (data+2) << 11) ^ hash;
        hash   = (hash << 16) ^ tmp;
        data  += 2*sizeof (uint16_t);
        hash  += hash >> 11;
    }

    /* Handle end cases */
    switch (rem) {
        case 3: hash += get16bits (data);
                hash ^= hash << 16;
                hash ^= ((signed char)data[sizeof (uint16_t)]) << 18;
                hash += hash >> 11;
                break;
        case 2: hash += get16bits (data);
                hash ^= hash << 11;
                hash += hash >> 17;
                break;
        case 1: hash += (signed char)*data;
                hash ^= hash << 10;
                hash += hash >> 1;
    }

    /* Force "avalanching" of final 127 bits */
    hash ^= hash << 3;
    hash += hash >> 5;
    hash ^= hash << 4;
    hash += hash >> 17;
    hash ^= hash << 25;
    hash += hash >> 6;

    // Limits hashes to be within the hash table    
    return hash % HT_LENGTH; 
}

看起来它工作正常，（而且应该是因为除最后一行之外的所有内容我都没有动过）。

这是我将字典加载到散列中的函数 table，它似乎也能正常工作。

bool load(const char* dictionary)
{
    // declares file pointer
    FILE* dictptr = fopen(dictionary, "r");

    // declare temp index
    uint32_t index = 0;

    // read words, one by one
    while(true)
    {

        // malloc node
        node* new_node = malloc(node_size);

        // insert word into node, if fscanf couldn't scan word; we're done
        if (fscanf(dictptr, "%s", new_node->word) != 1)
        {
            return true;
        }

        // hash word - HASH FUNCTION CALL -
        index = SuperFastHash(&new_node->word[0], sizeof(new_node->word));

        // check if head node has been assigned with value
        if (!strcmp(hashtable[index].word,""))
        {
            // declare hashtable[index] to new_node
            hashtable[index] = *new_node;

            //increment size
            hashtablesize++;
        }

        else
        {
            // if node is initialized, insert after head 
            new_node->next = hashtable[index].next;
            hashtable[index].next = new_node;

            //increment size
            hashtablesize++;
        }
    } 
}

最后，我的检查函数根据散列检查单词table。

bool check(const char* keyword)
{

    // gets index from SFH
    uint32_t index = SuperFastHash(keyword, sizeof(keyword));

    // declares head pointer to the pointer of the index'd element of hashtable
    node* head = &hashtable[index];

    // if word of head is equal to keyword, return true 
    // else continue down chain till head is null or key is found
    while (head != NULL)
    {
        if (!strcmp(head->word, keyword))
        {
            return true;
        }
        head = head->next;
    }
    return false;
}

注意：当使用不同的散列函数时，一切正常，所以我怀疑问题出在 len 参数或实际的 SFH 函数中。

我已经用 lldb 检查返回的索引，比如说，"cat" 不等于 "cat" 驻留在散列 table 中的索引。即，加载中函数调用返回的索引。

Answer 1

一些事情...

正如评论者所提到的，使用 sizeof() 不会为您提供正确的字符串长度。例如，更改

index = SuperFastHash(&new_node->word[0], sizeof(new_node->word));

到

index = SuperFastHash(&new_node->word[0], strlen(new_node->word));

您在阅读字典文件后未能调用 fclose()。如果 fopen() 成功，你应该调用 fclose().

下面的代码看起来有点可疑：

// check if head node has been assigned with value
if (!strcmp(hashtable[index].word,""))
{
    // declare hashtable[index] to new_node
    hashtable[index] = *new_node;

    //increment size
    hashtablesize++;
}

如果hashtable一开始就完全初始化了，还需要自增hashtablesize吗？如果散列 table 未完全初始化，则在尚未初始化的条目上调用 strcmp() 是潜在的麻烦。您没有显示声明或初始化代码，因此不能 100% 清楚这是否真的是一个问题，但可能需要仔细检查一下。

SuperFastHash returns 相同字符串的不同哈希值，但前提是由不同的函数调用确定

SuperFastHash returns different hashes for equal strings, but only if determined by different function calls

c

hash-function

hashtable