使用 hashCode 和 Arrays.equals 时潜在的散列问题
Potential hashing problems when using hashCode and Arrays.equals
正如我代码中的注释所解释的那样,任务是从给定的输入文件中找出彼此排列的字符串对的数量。例如,"ABCD" 和 "BCDA" 是彼此的排列,这意味着找到了一对。
我的程序的主要部分如下:
/**
* Finds the number of pairs of strings that are permutations of each other.
*
* A hash map is created with a hash code generated from the array formed using the getFrequency
* method as key and a pair containing a string array and the number of times a permutation of that
* particular string array has been found as value.
*
* If a permutation is already in the hash table previously, increment the counter.
*/
public static int findPairs(String fileName) {
try {
//Sets up the necessary file readers
FileReader dataFile = new FileReader(fileName);
BufferedReader bufferedDataFile = new BufferedReader(dataFile);
String line = bufferedDataFile.readLine();
//Finds the number of entries in the file
int num = Integer.parseInt(line);
int counter = 0;
int accumulator = 0;
HashMap<Integer, Pair> store = new HashMap<>();
for (int i = 0; i < num; i++) {
String current = bufferedDataFile.readLine();
int[] currentArr = getFrequency(current);
int currHashCode = Arrays.hashCode(currentArr);
if (store.containsKey(currHashCode)) {
Pair pairToCheck = store.get(currHashCode);
int[] arrToCheck = pairToCheck.getArr();
//Double checking, in case there is a collision and unequal arrays
//have the same hashCode
if (Arrays.equals(currentArr, arrToCheck)) {
counter = pairToCheck.getCount();
pairToCheck.updateCount();
} else {
//if the current bucket is not empty, and not a permutation of the input string,
//continue to conduct a linear probe
while (pairToCheck != null && !Arrays.equals(currentArr, arrToCheck)) {
currHashCode++;
pairToCheck = store.get(currHashCode);
arrToCheck = pairToCheck.getArr();
}
//if the current bucket is empty, add the new pair into the position
if (pairToCheck == null) {
counter = 0;
//otherwise, a permutation has been found later in the linear probe!
} else {
counter = pairToCheck.getCount();
pairToCheck.updateCount();
}
}
//no such permutation in the hash table yet!
} else {
counter = 0;
}
//Updates the accumulator using the counter. If there were already other strings
//which are permutations of the current string, the current string will be able to
//form a pair with each of these strings.
accumulator += counter;
//Updates the hash map only if the permutation has not been stored previously
if (counter == 0) {
Pair newPair = new Pair(currentArr, 1);
store.put(currHashCode, newPair);
}
}
//Close the file reader
bufferedDataFile.close();
return accumulator;
} catch (Exception e) {
System.out.println(e);
}
//In the event of an error, return -1
return -1;
}
这种对 Java 的 hashCode
和 Arrays
实施的操纵可能会导致哪些潜在问题?这尤其是因为我得到了一些要通过的私人测试用例,虽然我可以通过其中的一些,但有一个我一再失败。我怀疑这与我处理碰撞的方式有关......但是尽管我已经多次检查过,但我仍然不确定错误可能出在哪里。非常感谢任何帮助!
编辑:根据要求,这是我的 getFrequency 方法:
public static int[] getFrequency(String s) {
//There are 128 legal ascii characters
int[] charArr = new int[128];
//Iterate through the given string, and increment the count for a character using its
//ascii value to locate its position in the array
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
int ascii = (int) c;
charArr[ascii] += 1;
}
return charArr;
}
编辑 2:和配对:
public class Pair {
private int[] m_arr;
private int m_count;
public Pair(int[] arr, int count) {
this.m_arr = arr;
this.m_count = count;
}
public int[] getArr() {
return this.m_arr;
}
public int getCount() {
return this.m_count;
}
public void updateCount() {
this.m_count++;
}
}
查找字谜是一个已知问题。通常的解决方案是对字符串进行排序并比较排序后的字符串。排序时,"ABCD" 和 "BCDA" 都变成 "ABCD".
将排序后的字符串存储在一个集合中可以让您轻松找到匹配项。制作一个 class 将字符串分别保存在其排序和未排序版本中,以便于检索字符串的未排序版本。
您的散列函数不好,因为 "BB" 将散列为与 "AC" 相同的值。在字符串的排序版本上使用更好的哈希函数。
正如我代码中的注释所解释的那样,任务是从给定的输入文件中找出彼此排列的字符串对的数量。例如,"ABCD" 和 "BCDA" 是彼此的排列,这意味着找到了一对。
我的程序的主要部分如下:
/**
* Finds the number of pairs of strings that are permutations of each other.
*
* A hash map is created with a hash code generated from the array formed using the getFrequency
* method as key and a pair containing a string array and the number of times a permutation of that
* particular string array has been found as value.
*
* If a permutation is already in the hash table previously, increment the counter.
*/
public static int findPairs(String fileName) {
try {
//Sets up the necessary file readers
FileReader dataFile = new FileReader(fileName);
BufferedReader bufferedDataFile = new BufferedReader(dataFile);
String line = bufferedDataFile.readLine();
//Finds the number of entries in the file
int num = Integer.parseInt(line);
int counter = 0;
int accumulator = 0;
HashMap<Integer, Pair> store = new HashMap<>();
for (int i = 0; i < num; i++) {
String current = bufferedDataFile.readLine();
int[] currentArr = getFrequency(current);
int currHashCode = Arrays.hashCode(currentArr);
if (store.containsKey(currHashCode)) {
Pair pairToCheck = store.get(currHashCode);
int[] arrToCheck = pairToCheck.getArr();
//Double checking, in case there is a collision and unequal arrays
//have the same hashCode
if (Arrays.equals(currentArr, arrToCheck)) {
counter = pairToCheck.getCount();
pairToCheck.updateCount();
} else {
//if the current bucket is not empty, and not a permutation of the input string,
//continue to conduct a linear probe
while (pairToCheck != null && !Arrays.equals(currentArr, arrToCheck)) {
currHashCode++;
pairToCheck = store.get(currHashCode);
arrToCheck = pairToCheck.getArr();
}
//if the current bucket is empty, add the new pair into the position
if (pairToCheck == null) {
counter = 0;
//otherwise, a permutation has been found later in the linear probe!
} else {
counter = pairToCheck.getCount();
pairToCheck.updateCount();
}
}
//no such permutation in the hash table yet!
} else {
counter = 0;
}
//Updates the accumulator using the counter. If there were already other strings
//which are permutations of the current string, the current string will be able to
//form a pair with each of these strings.
accumulator += counter;
//Updates the hash map only if the permutation has not been stored previously
if (counter == 0) {
Pair newPair = new Pair(currentArr, 1);
store.put(currHashCode, newPair);
}
}
//Close the file reader
bufferedDataFile.close();
return accumulator;
} catch (Exception e) {
System.out.println(e);
}
//In the event of an error, return -1
return -1;
}
这种对 Java 的 hashCode
和 Arrays
实施的操纵可能会导致哪些潜在问题?这尤其是因为我得到了一些要通过的私人测试用例,虽然我可以通过其中的一些,但有一个我一再失败。我怀疑这与我处理碰撞的方式有关......但是尽管我已经多次检查过,但我仍然不确定错误可能出在哪里。非常感谢任何帮助!
编辑:根据要求,这是我的 getFrequency 方法:
public static int[] getFrequency(String s) {
//There are 128 legal ascii characters
int[] charArr = new int[128];
//Iterate through the given string, and increment the count for a character using its
//ascii value to locate its position in the array
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
int ascii = (int) c;
charArr[ascii] += 1;
}
return charArr;
}
编辑 2:和配对:
public class Pair {
private int[] m_arr;
private int m_count;
public Pair(int[] arr, int count) {
this.m_arr = arr;
this.m_count = count;
}
public int[] getArr() {
return this.m_arr;
}
public int getCount() {
return this.m_count;
}
public void updateCount() {
this.m_count++;
}
}
查找字谜是一个已知问题。通常的解决方案是对字符串进行排序并比较排序后的字符串。排序时,"ABCD" 和 "BCDA" 都变成 "ABCD".
将排序后的字符串存储在一个集合中可以让您轻松找到匹配项。制作一个 class 将字符串分别保存在其排序和未排序版本中,以便于检索字符串的未排序版本。
您的散列函数不好,因为 "BB" 将散列为与 "AC" 相同的值。在字符串的排序版本上使用更好的哈希函数。