SuperFastHash returns 相同字符串的不同哈希值,但前提是由不同的函数调用确定
SuperFastHash returns different hashes for equal strings, but only if determined by different function calls
所以,我的 SFH 函数:
/*
* Hash function (found at: 'http://www.azillionmonkeys.com/qed/hash.html')
*/
int32_t SuperFastHash(const char * data, int len) {
uint32_t hash = len, tmp;
int rem;
if (len <= 0 || data == NULL) return 0;
rem = len & 3;
len >>= 2;
/* Main loop */
for (;len > 0; len--) {
hash += get16bits (data);
tmp = (get16bits (data+2) << 11) ^ hash;
hash = (hash << 16) ^ tmp;
data += 2*sizeof (uint16_t);
hash += hash >> 11;
}
/* Handle end cases */
switch (rem) {
case 3: hash += get16bits (data);
hash ^= hash << 16;
hash ^= ((signed char)data[sizeof (uint16_t)]) << 18;
hash += hash >> 11;
break;
case 2: hash += get16bits (data);
hash ^= hash << 11;
hash += hash >> 17;
break;
case 1: hash += (signed char)*data;
hash ^= hash << 10;
hash += hash >> 1;
}
/* Force "avalanching" of final 127 bits */
hash ^= hash << 3;
hash += hash >> 5;
hash ^= hash << 4;
hash += hash >> 17;
hash ^= hash << 25;
hash += hash >> 6;
// Limits hashes to be within the hash table
return hash % HT_LENGTH;
}
看起来它工作正常,(而且应该是因为除最后一行之外的所有内容我都没有动过)。
这是我将字典加载到散列中的函数 table,它似乎也能正常工作。
bool load(const char* dictionary)
{
// declares file pointer
FILE* dictptr = fopen(dictionary, "r");
// declare temp index
uint32_t index = 0;
// read words, one by one
while(true)
{
// malloc node
node* new_node = malloc(node_size);
// insert word into node, if fscanf couldn't scan word; we're done
if (fscanf(dictptr, "%s", new_node->word) != 1)
{
return true;
}
// hash word - HASH FUNCTION CALL -
index = SuperFastHash(&new_node->word[0], sizeof(new_node->word));
// check if head node has been assigned with value
if (!strcmp(hashtable[index].word,""))
{
// declare hashtable[index] to new_node
hashtable[index] = *new_node;
//increment size
hashtablesize++;
}
else
{
// if node is initialized, insert after head
new_node->next = hashtable[index].next;
hashtable[index].next = new_node;
//increment size
hashtablesize++;
}
}
}
最后,我的检查函数根据散列检查单词table。
bool check(const char* keyword)
{
// gets index from SFH
uint32_t index = SuperFastHash(keyword, sizeof(keyword));
// declares head pointer to the pointer of the index'd element of hashtable
node* head = &hashtable[index];
// if word of head is equal to keyword, return true
// else continue down chain till head is null or key is found
while (head != NULL)
{
if (!strcmp(head->word, keyword))
{
return true;
}
head = head->next;
}
return false;
}
注意:当使用不同的散列函数时,一切正常,所以我怀疑问题出在 len 参数或实际的 SFH 函数中。
我已经用 lldb 检查返回的索引,比如说,"cat" 不等于 "cat" 驻留在散列 table 中的索引。即,加载中函数调用返回的索引。
一些事情...
正如评论者所提到的,使用 sizeof()
不会为您提供正确的字符串长度。例如,更改
index = SuperFastHash(&new_node->word[0], sizeof(new_node->word));
到
index = SuperFastHash(&new_node->word[0], strlen(new_node->word));
您在阅读字典文件后未能调用 fclose()
。如果 fopen()
成功,你应该调用 fclose()
.
下面的代码看起来有点可疑:
// check if head node has been assigned with value
if (!strcmp(hashtable[index].word,""))
{
// declare hashtable[index] to new_node
hashtable[index] = *new_node;
//increment size
hashtablesize++;
}
如果hashtable一开始就完全初始化了,还需要自增hashtablesize
吗?如果散列 table 未完全初始化,则在尚未初始化的条目上调用 strcmp()
是潜在的麻烦。您没有显示声明或初始化代码,因此不能 100% 清楚这是否真的是一个问题,但可能需要仔细检查一下。
所以,我的 SFH 函数:
/*
* Hash function (found at: 'http://www.azillionmonkeys.com/qed/hash.html')
*/
int32_t SuperFastHash(const char * data, int len) {
uint32_t hash = len, tmp;
int rem;
if (len <= 0 || data == NULL) return 0;
rem = len & 3;
len >>= 2;
/* Main loop */
for (;len > 0; len--) {
hash += get16bits (data);
tmp = (get16bits (data+2) << 11) ^ hash;
hash = (hash << 16) ^ tmp;
data += 2*sizeof (uint16_t);
hash += hash >> 11;
}
/* Handle end cases */
switch (rem) {
case 3: hash += get16bits (data);
hash ^= hash << 16;
hash ^= ((signed char)data[sizeof (uint16_t)]) << 18;
hash += hash >> 11;
break;
case 2: hash += get16bits (data);
hash ^= hash << 11;
hash += hash >> 17;
break;
case 1: hash += (signed char)*data;
hash ^= hash << 10;
hash += hash >> 1;
}
/* Force "avalanching" of final 127 bits */
hash ^= hash << 3;
hash += hash >> 5;
hash ^= hash << 4;
hash += hash >> 17;
hash ^= hash << 25;
hash += hash >> 6;
// Limits hashes to be within the hash table
return hash % HT_LENGTH;
}
看起来它工作正常,(而且应该是因为除最后一行之外的所有内容我都没有动过)。
这是我将字典加载到散列中的函数 table,它似乎也能正常工作。
bool load(const char* dictionary)
{
// declares file pointer
FILE* dictptr = fopen(dictionary, "r");
// declare temp index
uint32_t index = 0;
// read words, one by one
while(true)
{
// malloc node
node* new_node = malloc(node_size);
// insert word into node, if fscanf couldn't scan word; we're done
if (fscanf(dictptr, "%s", new_node->word) != 1)
{
return true;
}
// hash word - HASH FUNCTION CALL -
index = SuperFastHash(&new_node->word[0], sizeof(new_node->word));
// check if head node has been assigned with value
if (!strcmp(hashtable[index].word,""))
{
// declare hashtable[index] to new_node
hashtable[index] = *new_node;
//increment size
hashtablesize++;
}
else
{
// if node is initialized, insert after head
new_node->next = hashtable[index].next;
hashtable[index].next = new_node;
//increment size
hashtablesize++;
}
}
}
最后,我的检查函数根据散列检查单词table。
bool check(const char* keyword)
{
// gets index from SFH
uint32_t index = SuperFastHash(keyword, sizeof(keyword));
// declares head pointer to the pointer of the index'd element of hashtable
node* head = &hashtable[index];
// if word of head is equal to keyword, return true
// else continue down chain till head is null or key is found
while (head != NULL)
{
if (!strcmp(head->word, keyword))
{
return true;
}
head = head->next;
}
return false;
}
注意:当使用不同的散列函数时,一切正常,所以我怀疑问题出在 len 参数或实际的 SFH 函数中。
我已经用 lldb 检查返回的索引,比如说,"cat" 不等于 "cat" 驻留在散列 table 中的索引。即,加载中函数调用返回的索引。
一些事情...
正如评论者所提到的,使用
sizeof()
不会为您提供正确的字符串长度。例如,更改index = SuperFastHash(&new_node->word[0], sizeof(new_node->word));
到
index = SuperFastHash(&new_node->word[0], strlen(new_node->word));
您在阅读字典文件后未能调用
fclose()
。如果fopen()
成功,你应该调用fclose()
.下面的代码看起来有点可疑:
// check if head node has been assigned with value if (!strcmp(hashtable[index].word,"")) { // declare hashtable[index] to new_node hashtable[index] = *new_node; //increment size hashtablesize++; }
如果hashtable一开始就完全初始化了,还需要自增hashtablesize
吗?如果散列 table 未完全初始化,则在尚未初始化的条目上调用 strcmp()
是潜在的麻烦。您没有显示声明或初始化代码,因此不能 100% 清楚这是否真的是一个问题,但可能需要仔细检查一下。