Apache Commons Lang HashCodeBuilder 冲突

Apache Commons Lang HashCodeBuilder collision

我在使用 Apache Commons Lang HashCodeBuilder 使用版本 3.4 时遇到冲突。我正在散列一个 Route 对象,它包含两个 Cell 对象,start 和 end。最后我提供了一个发生碰撞的例子。 classes 都覆盖了 hashCodeequals 方法。首先是单元格 class:

import org.apache.commons.lang3.builder.EqualsBuilder;
import org.apache.commons.lang3.builder.HashCodeBuilder;

public class Cell {
    private int east;
    private int south;

    public Cell(int east, int south) {
        this.east = east;
        this.south = south;
    }

    public int getEast() {
        return east;
    }

    public void setEast(int east) {
        this.east = east;
    }

    public int getSouth() {
        return south;
    }

    public void setSouth(int south) {
        this.south = south;
    }

    @Override
    /**
     * Compute hash code by using Apache Commons Lang HashCodeBuilder.
     */
    public int hashCode() {
        return new HashCodeBuilder(17, 31)
                .append(this.south)
                .append(this.east)
                .toHashCode();
    }

    @Override
    /**
     * Compute equals by using Apache Commons Lang EqualsBuilder.
     */
    public boolean equals(Object obj) {
        if (!(obj instanceof Cell))
            return false;
        if (obj == this)
            return true;

        Cell cell = (Cell) obj;
        return new EqualsBuilder()
                .append(this.south, cell.south)
                .append(this.east, cell.east)
                .isEquals();
    }
}

路线class:

import org.apache.commons.lang3.builder.EqualsBuilder;
import org.apache.commons.lang3.builder.HashCodeBuilder;

import java.util.*;

public class Route {
    private Cell startCell;
    private Cell endCell;

    public Route(Cell startCell, Cell endCell) {
        this.startCell = startCell;
        this.endCell = endCell;
    }

    public Cell getStartCell() {
        return startCell;
    }

    public void setStartCell(Cell startCell) {
        this.startCell = startCell;
    }

    public Cell getEndCell() {
        return endCell;
    }

    public void setEndCell(Cell endCell) {
        this.endCell = endCell;
    }


    @Override
    public int hashCode() {
        return new HashCodeBuilder(43, 59)
                .append(this.startCell)
                .append(this.endCell)
                .toHashCode();
    }

    @Override
    public boolean equals(Object obj) {
        if (!(obj instanceof Route))
            return false;
        if (obj == this)
            return true;

        Route route = (Route) obj;
        return new EqualsBuilder()
                .append(this.startCell, route.startCell)
                .append(this.endCell, route.endCell)
                .isEquals();
    }
}

碰撞示例:

public class Collision {
    public static void main(String[] args) {
        Route route1 = new Route(new Cell(154, 156), new Cell(154, 156));
        Route route2 = new Route(new Cell(153, 156), new Cell(151, 158));

        System.out.println(route1.hashCode() + " " + route2.hashCode());
    }
}

输出为 1429303 1429303。现在,如果我将两个 classes 的初始奇数和乘数奇数更改为相同,则此示例不会发生冲突。但是在 HashCodeBuilder 的文档中它明确指定:

Two randomly chosen, odd numbers must be passed in. Ideally these should be different for each class, however this is not vital.

理想情况下,如果可能的话,我希望我的例子有完美的散列函数(单射函数)。

在 java 中,哈希码绑定到整数(32 位)范围内,因此这意味着如果您有超过 2^62 个对象(如果您有理想的分布则发生事件) .但在实践中,碰撞更频繁地发生,因为哈希码提供了不完美的分布。

您可以通过在生成哈希码时添加更多参数来更优化地分发生成的哈希码(这与 Apache 公共库无关)。使用此示例,您可以预先计算 Route class 的一个或多个 属性 并在生成哈希码时使用此 属性 。例如,计算两个 Cell 对象之间的线的斜率:

double slope = (startCell.getEast() - endCell.getEast());
if ( slope == 0 ){//prevent division by 0
    slope = startCell.getSouth() - endCell.getSouth();
}else{
    slope = (startCell.getSouth() - endCell.getSouth()) / slope;
}

return new HashCodeBuilder(43, 59)
   .append(this.startCell)
   .append(this.endCell)
   .append(slope)
   .toHashCode();

使用您的示例生成 83091911 83088489。或者(或一起)使用两个 Cell 对象之间的距离:

double length = Math.sqrt(Math.pow(startCell.getSouth() - endCell.getSouth(), 2) + Math.pow(startCell.getEast() - endCell.getEast(), 2));
return new HashCodeBuilder(43, 59)
   .append(this.startCell)
   .append(this.endCell)
   .append(length)
   .toHashCode();

单独与您的示例一起使用会导致 83091911-486891382

并测试这是否可以防止碰撞:

List<Cell> cells = new ArrayList<Cell>();
for ( int i = 0; i < 50; i++ ){
    for ( int j = 0; j < 50; j++ ){
        Cell c = new Cell(i,j);
        cells.add(c);

    }
}
System.out.println(cells.size() + " cells generated");
System.out.println("Testing " + (cells.size()*cells.size()) + " number of Routes");
Set<Integer> set = new HashSet<Integer>();
int collisions = 0;
for ( int i = 0; i < cells.size(); i++ ){
    for ( int j = 0; j < cells.size(); j++ ){
        Route r = new Route(cells.get(i), cells.get(j));
        if ( set.contains(r.hashCode() ) ){
            collisions++;
        }
        set.add(r.hashCode());
    }
}
System.out.println(collisions);

6,250,000 条路线中生成:

  1. 没有长度斜率6,155,919次碰撞
  2. 长度斜率873,047次碰撞