Java:是否可以优化双comparisons/multiplications?

Java: Is it possible to Optimize double comparisons/multiplications?

我的物理引擎一直存在一个非常令人担忧的问题,那就是我无法同时计算数以千计的碰撞。我已经优化了负责告诉我碰撞的方法,它创建了 0 个对象,此时只是 multiplication/comparison;但还是不够快!

*注意:请不要在物理引擎结构上欺骗我,我的项目目前将物理添加到 Minecraft,一款由数百万个立方体组成的游戏。可以想象,这会为这样的模拟带来一些独特的挑战 -.-

在上下文中,多边形是一个包含 8 个向量的数组;并且 Vector 只是一个向量...另外 2 个向量的点积是 (v1.xv2.x+v1.yv2.y +v1.z*v2.z)。无论如何,这是此时使用所有处理时间的 10% 的代码!

public class ReusableCollisionObject{

public boolean seperated;
public double movMaxFixMin,movMinFixMax;
private static double maxPlayer,minPlayer,maxBlock,minBlock,dot;

public void generateCollision(Polygon movable_,Polygon stationary,Vector axes){
    maxPlayer = minPlayer = axes.dot(movable_.vertices[0]);
    dot = axes.dot(movable_.vertices[1]);
    if(dot>maxPlayer){
        maxPlayer = dot;
    }
    if(dot<minPlayer){
        minPlayer = dot;
    }
    dot = axes.dot(movable_.vertices[2]);
    if(dot>maxPlayer){
        maxPlayer = dot;
    }
    if(dot<minPlayer){
        minPlayer = dot;
    }
    dot = axes.dot(movable_.vertices[3]);
    if(dot>maxPlayer){
        maxPlayer = dot;
    }
    if(dot<minPlayer){
        minPlayer = dot;
    }
    dot = axes.dot(movable_.vertices[4]);
    if(dot>maxPlayer){
        maxPlayer = dot;
    }
    if(dot<minPlayer){
        minPlayer = dot;
    }
    dot = axes.dot(movable_.vertices[5]);
    if(dot>maxPlayer){
        maxPlayer = dot;
    }
    if(dot<minPlayer){
        minPlayer = dot;
    }
    dot = axes.dot(movable_.vertices[6]);
    if(dot>maxPlayer){
        maxPlayer = dot;
    }
    if(dot<minPlayer){
        minPlayer = dot;
    }
    dot = axes.dot(movable_.vertices[7]);
    if(dot>maxPlayer){
        maxPlayer = dot;
    }
    if(dot<minPlayer){
        minPlayer = dot;
    }

    maxBlock = minBlock = axes.dot(stationary.vertices[0]);
    dot = axes.dot(stationary.vertices[1]);
    if(dot>maxBlock){
        maxBlock = dot;
    }
    if(dot<minBlock){
        minBlock = dot;
    }
    dot = axes.dot(stationary.vertices[2]);
    if(dot>maxBlock){
        maxBlock = dot;
    }
    if(dot<minBlock){
        minBlock = dot;
    }
    dot = axes.dot(stationary.vertices[3]);
    if(dot>maxBlock){
        maxBlock = dot;
    }
    if(dot<minBlock){
        minBlock = dot;
    }
    dot = axes.dot(stationary.vertices[4]);
    if(dot>maxBlock){
        maxBlock = dot;
    }
    if(dot<minBlock){
        minBlock = dot;
    }
    dot = axes.dot(stationary.vertices[5]);
    if(dot>maxBlock){
        maxBlock = dot;
    }
    if(dot<minBlock){
        minBlock = dot;
    }
    dot = axes.dot(stationary.vertices[6]);
    if(dot>maxBlock){
        maxBlock = dot;
    }
    if(dot<minBlock){
        minBlock = dot;
    }
    dot = axes.dot(stationary.vertices[7]);
    if(dot>maxBlock){
        maxBlock = dot;
    }
    if(dot<minBlock){
        minBlock = dot;
    }
    seperated = minPlayer>maxBlock||maxPlayer<minBlock;
}

}

是否有可能使像这样的原始数学计算得更快 运行?

编辑:感谢我得到的答案,我重组了操作以提高性能并将所有双精度数转换为浮点数。这是经过优化的新 class.

public class ReusableCollisionObject{

public boolean seperated;
public double movMaxFixMin,movMinFixMax;
private static double maxPlayer,minPlayer,maxBlock,minBlock;
private static final float[] cachemovable_ = new float[16];

public void generateCollision(Polygon movable_,Polygon stationary,Vector axes){
    cachemovable_[0] = axes.X*movable_.vertices[0].X+axes.Y*movable_.vertices[0].Y+axes.Z*movable_.vertices[0].Z;
    cachemovable_[1] = axes.X*movable_.vertices[1].X+axes.Y*movable_.vertices[1].Y+axes.Z*movable_.vertices[1].Z;
    cachemovable_[2] = axes.X*movable_.vertices[2].X+axes.Y*movable_.vertices[2].Y+axes.Z*movable_.vertices[2].Z;
    cachemovable_[3] = axes.X*movable_.vertices[3].X+axes.Y*movable_.vertices[3].Y+axes.Z*movable_.vertices[3].Z;
    cachemovable_[4] = axes.X*movable_.vertices[4].X+axes.Y*movable_.vertices[4].Y+axes.Z*movable_.vertices[4].Z;
    cachemovable_[5] = axes.X*movable_.vertices[5].X+axes.Y*movable_.vertices[5].Y+axes.Z*movable_.vertices[5].Z;
    cachemovable_[6] = axes.X*movable_.vertices[6].X+axes.Y*movable_.vertices[6].Y+axes.Z*movable_.vertices[6].Z;
    cachemovable_[7] = axes.X*movable_.vertices[7].X+axes.Y*movable_.vertices[7].Y+axes.Z*movable_.vertices[7].Z;
    cachemovable_[8] = axes.X*stationary.vertices[0].X+axes.Y*stationary.vertices[0].Y+axes.Z*stationary.vertices[0].Z;
    cachemovable_[9] = axes.X*stationary.vertices[1].X+axes.Y*stationary.vertices[1].Y+axes.Z*stationary.vertices[1].Z;
    cachemovable_[10] = axes.X*stationary.vertices[2].X+axes.Y*stationary.vertices[2].Y+axes.Z*stationary.vertices[2].Z;
    cachemovable_[11] = axes.X*stationary.vertices[3].X+axes.Y*stationary.vertices[3].Y+axes.Z*stationary.vertices[3].Z;
    cachemovable_[12] = axes.X*stationary.vertices[4].X+axes.Y*stationary.vertices[4].Y+axes.Z*stationary.vertices[4].Z;
    cachemovable_[13] = axes.X*stationary.vertices[5].X+axes.Y*stationary.vertices[5].Y+axes.Z*stationary.vertices[5].Z;
    cachemovable_[14] = axes.X*stationary.vertices[6].X+axes.Y*stationary.vertices[6].Y+axes.Z*stationary.vertices[6].Z;
    cachemovable_[15] = axes.X*stationary.vertices[7].X+axes.Y*stationary.vertices[7].Y+axes.Z*stationary.vertices[7].Z;

    maxPlayer = minPlayer = cachemovable_[0];
    maxBlock = minBlock = cachemovable_[8];

    if(cachemovable_[1]>maxPlayer){
        maxPlayer = cachemovable_[1];
    }
    if(cachemovable_[1]<minPlayer){
        minPlayer = cachemovable_[1];
    }
    if(cachemovable_[2]>maxPlayer){
        maxPlayer = cachemovable_[2];
    }
    if(cachemovable_[2]<minPlayer){
        minPlayer = cachemovable_[2];
    }
    if(cachemovable_[3]>maxPlayer){
        maxPlayer = cachemovable_[3];
    }
    if(cachemovable_[3]<minPlayer){
        minPlayer = cachemovable_[3];
    }
    if(cachemovable_[4]>maxPlayer){
        maxPlayer = cachemovable_[4];
    }
    if(cachemovable_[4]<minPlayer){
        minPlayer = cachemovable_[4];
    }
    if(cachemovable_[5]>maxPlayer){
        maxPlayer = cachemovable_[5];
    }
    if(cachemovable_[5]<minPlayer){
        minPlayer = cachemovable_[5];
    }
    if(cachemovable_[6]>maxPlayer){
        maxPlayer = cachemovable_[6];
    }
    if(cachemovable_[6]<minPlayer){
        minPlayer = cachemovable_[6];
    }
    if(cachemovable_[7]>maxPlayer){
        maxPlayer = cachemovable_[7];
    }
    if(cachemovable_[7]<minPlayer){
        minPlayer = cachemovable_[7];
    }
    if(cachemovable_[9]>maxBlock){
        maxBlock = cachemovable_[9];
    }
    if(cachemovable_[9]<minBlock){
        minBlock = cachemovable_[9];
    }
    if(cachemovable_[10]>maxBlock){
        maxBlock = cachemovable_[10];
    }
    if(cachemovable_[10]<minBlock){
        minBlock = cachemovable_[10];
    }
    if(cachemovable_[11]>maxBlock){
        maxBlock = cachemovable_[11];
    }
    if(cachemovable_[11]<minBlock){
        minBlock = cachemovable_[11];
    }
    if(cachemovable_[12]>maxBlock){
        maxBlock = cachemovable_[12];
    }
    if(cachemovable_[12]<minBlock){
        minBlock = cachemovable_[12];
    }
    if(cachemovable_[13]>maxBlock){
        maxBlock = cachemovable_[13];
    }
    if(cachemovable_[13]<minBlock){
        minBlock = cachemovable_[13];
    }
    if(cachemovable_[14]>maxBlock){
        maxBlock =  cachemovable_[14];
    }
    if(cachemovable_[14]<minBlock){
        minBlock = cachemovable_[14];
    }
    if(cachemovable_[15]>maxBlock){
        maxBlock = cachemovable_[15];
    }
    if(cachemovable_[15]<minBlock){
        minBlock = cachemovable_[15];
    }
    seperated = minPlayer>maxBlock||maxPlayer<minBlock;
}

}

少调用该方法。 这会给您带来更好的 bigO 可扩展性收益,而不仅仅是使该方法更快一些。当分析器说一个方法很慢时,有两种方法可以解决它:让它更快或更少调用它。

如何?

假设您检查 1000 个对象的碰撞。我假设您当前的代码会检查每个组合是否存在冲突,因此大约 500000 组合(A-B、A-C、A-D、...、B-C、B-D、...、C-D、...),因此有很多调用那个方法。

如果您能提前知道哪些组合永远不会发生碰撞会怎么样?在 1 维 space 中,NavigableMap(这是普通的 java)可以帮助您。在多维 space 中,您需要类似 kd-map 的东西(或者只是将其应用于 1 维,这已经是很好的增益)。

例如,如果我们只看一维,给定一个对象 A 在位置 137.4(在该维度),速度为 20.3,它可以在 [=] 之间的任何位置结束15=] 和 157.7。因此,让我们将最小的数字放在地图中:NavigableMap.put(117.1, A)。现在,给定一个 B 在 50.470.4 之间的任何地方结束,我们可以问 navigableMap.floorMap(70.4, true) 不包含 A(它的键为 117.1)也不包含任何最小数字高于 70.4.

的其他元素

我假设您尽可能少地调用该方法 - 如果情况并非如此,请查看 Geoffrey De Smet 的回答,并减少调用它的次数!如果你认为你尽可能少地调用它,请先检查他的答案,以防万一!

此外,我假设您没有使用 strictfp;如果你是,不要。为此,您不需要符合 IEEE 754 标准的浮点数。这通常不是默认设置,但我想提一下以防万一。

这里有一些不涉及算法重新设计的方法来降低浮点代码的成本(但如果可以的话,其贡献远小于算法重新设计):

首先,预先计算并尽可能多地保存计算,因为内存通常比 CPU 更容易获得;在很多情况下甚至可以节省部分计算。

此外,配置文件使用浮点数而不是双精度数。 Float 需要移动更少的内存,并且由于 Java 倾向于在内存中构建树状对象,这些对象散布太多,而不是将所有内容放在一个适合 CPU 的小包中的扁平对象缓存行,您可以通过减小数据类型的大小来增加适合 CPU 缓存的数据量。

尝试让代码在你的八个计算中自动向量化,因为现代 CPUs 可以使用 SSE/SSE2/SSE3/AVX 扩展一次执行 8 到 32 个浮点计算,在与使用旧的 80x87 兼容浮点架构的普通浮点计算同时进行 - 如果您的 Java JIT/JVM 支持自动矢量化,您可能需要重写代码以便 JIT/JVM认识到它可以矢量化;如果您的 JIT/JVM 不支持自动向量化,请考虑使用第三方库公开块原生向量操作并在一次操作中计算所有点积,将它们加载到列表中,然后使用 min/max 找到极限。