SuperFastHash returns 相同字符串的不同哈希值,但前提是由不同的函数调用确定

SuperFastHash returns different hashes for equal strings, but only if determined by different function calls

所以,我的 SFH 函数:

/*  
 * Hash function (found at: 'http://www.azillionmonkeys.com/qed/hash.html')  
 */ 
int32_t SuperFastHash(const char * data, int len)  {
    uint32_t hash = len, tmp;
    int rem;

    if (len <= 0 || data == NULL) return 0;

    rem = len & 3;
    len >>= 2;

    /* Main loop */
    for (;len > 0; len--) {
        hash  += get16bits (data);
        tmp    = (get16bits (data+2) << 11) ^ hash;
        hash   = (hash << 16) ^ tmp;
        data  += 2*sizeof (uint16_t);
        hash  += hash >> 11;
    }

    /* Handle end cases */
    switch (rem) {
        case 3: hash += get16bits (data);
                hash ^= hash << 16;
                hash ^= ((signed char)data[sizeof (uint16_t)]) << 18;
                hash += hash >> 11;
                break;
        case 2: hash += get16bits (data);
                hash ^= hash << 11;
                hash += hash >> 17;
                break;
        case 1: hash += (signed char)*data;
                hash ^= hash << 10;
                hash += hash >> 1;
    }

    /* Force "avalanching" of final 127 bits */
    hash ^= hash << 3;
    hash += hash >> 5;
    hash ^= hash << 4;
    hash += hash >> 17;
    hash ^= hash << 25;
    hash += hash >> 6;

    // Limits hashes to be within the hash table    
    return hash % HT_LENGTH; 
}

看起来它工作正常,(而且应该是因为除最后一行之外的所有内容我都没有动过)。

这是我将字典加载到散列中的函数 table,它似乎也能正常工作。

bool load(const char* dictionary)
{
    // declares file pointer
    FILE* dictptr = fopen(dictionary, "r");

    // declare temp index
    uint32_t index = 0;

    // read words, one by one
    while(true)
    {

        // malloc node
        node* new_node = malloc(node_size);

        // insert word into node, if fscanf couldn't scan word; we're done
        if (fscanf(dictptr, "%s", new_node->word) != 1)
        {
            return true;
        }

        // hash word - HASH FUNCTION CALL -
        index = SuperFastHash(&new_node->word[0], sizeof(new_node->word));

        // check if head node has been assigned with value
        if (!strcmp(hashtable[index].word,""))
        {
            // declare hashtable[index] to new_node
            hashtable[index] = *new_node;

            //increment size
            hashtablesize++;
        }

        else
        {
            // if node is initialized, insert after head 
            new_node->next = hashtable[index].next;
            hashtable[index].next = new_node;

            //increment size
            hashtablesize++;
        }
    } 
}

最后,我的检查函数根据散列检查单词table。

bool check(const char* keyword)
{

    // gets index from SFH
    uint32_t index = SuperFastHash(keyword, sizeof(keyword));

    // declares head pointer to the pointer of the index'd element of hashtable
    node* head = &hashtable[index];

    // if word of head is equal to keyword, return true 
    // else continue down chain till head is null or key is found
    while (head != NULL)
    {
        if (!strcmp(head->word, keyword))
        {
            return true;
        }
        head = head->next;
    }
    return false;
}

注意:当使用不同的散列函数时,一切正常,所以我怀疑问题出在 len 参数或实际的 SFH 函数中。

我已经用 lldb 检查返回的索引,比如说,"cat" 不等于 "cat" 驻留在散列 table 中的索引。即,加载中函数调用返回的索引。

一些事情...

  1. 正如评论者所提到的,使用 sizeof() 不会为您提供正确的字符串长度。例如,更改

    index = SuperFastHash(&new_node->word[0], sizeof(new_node->word));
    

    index = SuperFastHash(&new_node->word[0], strlen(new_node->word));
    
  2. 您在阅读字典文件后未能调用 fclose()。如果 fopen() 成功,你应该调用 fclose().

  3. 下面的代码看起来有点可疑:

    // check if head node has been assigned with value
    if (!strcmp(hashtable[index].word,""))
    {
        // declare hashtable[index] to new_node
        hashtable[index] = *new_node;
    
        //increment size
        hashtablesize++;
    }
    

如果hashtable一开始就完全初始化了,还需要自增hashtablesize吗?如果散列 table 未完全初始化,则在尚未初始化的条目上调用 strcmp() 是潜在的麻烦。您没有显示声明或初始化代码,因此不能 100% 清楚这是否真的是一个问题,但可能需要仔细检查一下。