为什么 HashMap 有自己的 hashCode() 内部实现,称为 hash()?
Why and how does HashMap have its own internal implementation of hashCode() called hash()?
根据 this blog entry,HashMap 在它已经检索到的哈希码上重新调用它自己的 hashCode()
实现(称为 hash()
)。
If key is not null then , it will call hashfunction on the key object , see line 4 in above method i.e. key.hashCode() ,so after key.hashCode() returns hashValue , line 4 looks like
int hash = hash(hashValue)
and now ,it applies returned hashValue into its own hashing function .
We might wonder why we are calculating the hashvalue again using hash(hashValue). Answer is ,It defends against poor quality hash >functions.
HashMap 能否准确 重新分配哈希码? HashMap 可以存储对象,但它无法访问为对象分配 hashCode 的逻辑。例如,hash()
不可能集成以下 hashCode()
实现背后的逻辑:
public class Employee {
protected long employeeId;
protected String firstName;
protected String lastName;
public int hashCode(){
return (int) employeeId;
}
}
hash()
从实际哈希码中推导出"improved"哈希码,所以相等的输入永远是相等的输出(来自jdk1.8.0_51):
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
为什么hash码需要改进,看方法的javadoc:
Computes key.hashCode() and spreads (XORs) higher bits of hash to lower. Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. (Among known examples are sets of Float keys holding consecutive whole numbers in small tables.) So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, and quality of bit-spreading. Because many common sets of hashes are already reasonably distributed (so don't benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.
根据 this blog entry,HashMap 在它已经检索到的哈希码上重新调用它自己的 hashCode()
实现(称为 hash()
)。
If key is not null then , it will call hashfunction on the key object , see line 4 in above method i.e. key.hashCode() ,so after key.hashCode() returns hashValue , line 4 looks like
int hash = hash(hashValue)
and now ,it applies returned hashValue into its own hashing function .
We might wonder why we are calculating the hashvalue again using hash(hashValue). Answer is ,It defends against poor quality hash >functions.
HashMap 能否准确 重新分配哈希码? HashMap 可以存储对象,但它无法访问为对象分配 hashCode 的逻辑。例如,hash()
不可能集成以下 hashCode()
实现背后的逻辑:
public class Employee {
protected long employeeId;
protected String firstName;
protected String lastName;
public int hashCode(){
return (int) employeeId;
}
}
hash()
从实际哈希码中推导出"improved"哈希码,所以相等的输入永远是相等的输出(来自jdk1.8.0_51):
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
为什么hash码需要改进,看方法的javadoc:
Computes key.hashCode() and spreads (XORs) higher bits of hash to lower. Because the table uses power-of-two masking, sets of hashes that vary only in bits above the current mask will always collide. (Among known examples are sets of Float keys holding consecutive whole numbers in small tables.) So we apply a transform that spreads the impact of higher bits downward. There is a tradeoff between speed, utility, and quality of bit-spreading. Because many common sets of hashes are already reasonably distributed (so don't benefit from spreading), and because we use trees to handle large sets of collisions in bins, we just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds.