哈希函数 C

Question

我在为哈希 table 实现哈希函数时遇到问题。

我想散列我的话，使得 A = 1、B = 2、C = 3，依此类推。字母在单词中的位置无关紧要，因为我们将考虑单词的排列。而且，字母的大小写在这个问题中也是无关紧要的，所以a的值= A的值= 1.

对于字符串，abc = 1 + 2 + 3 = 6，bc = 2 + 3 = 5，等等

对于 ab = 3 和 aaa = 3 的情况，我已经有办法处理这种情况。现在我只想获取哈希值。

我现在遇到的问题是aaa给我1，ab给我2。

下面是我的代码：

int hash(char *word)
{
   int h = 1;
   int i, j;

   char *A;
   char *a;
   // an array of 26 slots for 26 uppercase letters in the alphabet
   A = (char *)malloc(26 * sizeof(char));
   // an array of 26 slots for 26 lowercase letters in the alphabet
   a = (char *)malloc(26 * sizeof(char));

   for (i = 0; i < 26; i++) {
      A[i] = (char)(i + 65); // fill the array from A to Z
      a[i] = (char)(i + 97); // fill the array from a to z
   }

   for (i = 0; i < strlen(word); i++) {
      //printf("HIT\n");
      for (j = 0; j < 26; j++) {
         // upper and lower case have the same hash value
         if (word[i] == A[j] || word[i] == a[j]) {
            h = h + j; // get the hash value of the word
            //printf("HIT 2\n");
            break;
         }
      }
   }

   printf("H: %d\n", h);

   return h;
}

Answer 1

我认为改变

int h = 1;

至

int h = 0;

和

h = h + j;

至

h = h + j + 1;

将解决问题。

另一个问题是您忘记释放 malloced 内存。还有，C里有no need to cast the result of malloc(and family)

这个

for (i = 0; i < strlen(word); i++) {

将在循环的每次迭代中调用 strlen。这会降低程序的性能。使用

int len = strlen(word);
for (i = 0; i < len; i++) {

相反，这要快得多，因为 strlen 不会在每次迭代中都被调用。最后，sizeof(char)是1，所以可以省略。

Answer 2

将h=h+j改为h=h+j+1 h=1 到 h=0.

您还应该释放分配的内存，以便在 return:

之前包含这些行

free(A);
free(a);

但是我不明白为什么要为这么简单的任务编写如此复杂的代码。

可以编写更简单的代码：

int hash(char *word)
{
   int sum=0;
   while(*word != '[=11=]')
   {

       if(*word >='A' && *word < 'A'+26)
            sum=sum+(*word -'A' + 1);
       else if(*word >='a' && *word < 'a'+26)
            sum=sum+(*word -'a' + 1);
       else
            return -1;

       word++;
   }

   return sum;
}

Answer 3

多个问题：

您仍然没有释放分配的数组
h 的初始值 1 没有意义
您将索引添加到哈希中。 'A' 和 'a' 位于索引 0，所以在这种情况下你要添加 0（所以无论你给代码多少 'a' 都会 return 1）

为什么是动态数组？你知道大小，它不会改变。你可以使用

char A[26];
char a[26]; // you can also add initialisation, e.g. = {'a', 'b', ...

为什么首先是数组？

因此，here 是快速修复，紧跟您的代码。

考虑到以上所有因素，您可以简化为：

int hash(char const * string) {
  int h = 0;
  for (; *string; ++string) {
    int index = tolower(*string) - 'a' + 1;
    if ((index > 0) && (index < 27)) {
      h += index;
    }
  }
  return h;
}

Live

当只对非特殊字符的单词进行哈希处理时，您需要以某种方式处理调用者中忽略的单词。

char hash(char const * string, int * h) {
  *h = 0;
  for (; *string; ++string) {
    int index = tolower(*string) - 'a' + 1;
    if ((index > 0) && (index < 27)) {
      *h += index;
    } else {
      return 0;
    }
  }
  return 1;
}

这样您就可以使用 return 值来测试是否应忽略该词。

哈希函数 C

Hash function C

c

ascii

hashtable