在 java 影响性能的可读性中留下冗余变量等

is leaving redundant variables etc for readability in java performance impacting

我正在开发一个优化的 java 库,我想知道是否有像

这样的东西
    int rX = rhsOffset;
    int rY = rhsOffset + 1;
    int rZ = rhsOffset + 2;
    int rW = rhsOffset + 3;

其中局部变量 rX 是冗余的,但可以使代码更深入地阅读。在这种情况下,rX 是否只是在 Java 字节代码或 JIT 执行时间编译出来?

我也看过图书馆

 m[offset + 0] = f / aspect;
 m[offset + 1] = 0.0f;
 m[offset + 2] = 0.0f;
 m[offset + 3] = 0.0f;

其中“+ 0”是为了改善代码的外观。

我也想这样做,但要确保我不会影响性能。我不知道有什么好方法可以确定这些情况下是否分配了内存或是否处理了数学运算。在 Android Studio 中,您可以使用内存分析器,它允许您捕获所有分配并检查它们,但 IntelliJ 似乎没有提供该功能,我假设我不能依赖任何优化 androids 构建系统确实要对正常(非android)Java 项目完成。

我写了一些代码来进行实验性研究,请参阅 my repository on Github

总结: 我 运行 在我的 64 位 Ubuntu 计算机上使用 Oracle JDK 进行了一些实验 JDK 9。据我所知,通过这些特定实验,(i) 冗余变量不会' 在实践中似乎会影响运行时间 并且 (ii) 是否添加冗余 0 似乎无关紧要。我的建议是不要担心你提到的那种性能问题,即时编译器可能足够智能来处理这些事情,糟糕的性能可能永远不会成为问题。

对于第一个问题,我 运行 我对 Oracle JDK 9 javac 编译器和可嵌入的 Janino 编译器进行了实验。我得到了类似的结果,这表明可能大多数优化都是由 JIT 执行的。

我建议您在 JVM 上使用您认为代表您正在做的事情的玩具示例进行自己的实验。或者直接在您的实际代码中进行测量以防糟糕的性能成为问题

下面是我的实验的详细信息。

问题一:引入冗余变量会影响执行时间吗?

我引入了一个参数,我们称它为n,它控制冗余赋值的程度,并编写了一个代码生成器,它将为无意义的计算生成代码并引入冗余赋值基于 n 的值。例如,对于 n=0,它会生成以下代码:

public static double eval0(double[] X, double[] Y) {
  double sum = 0.0;
  assert(X.length == Y.length);
  int iters = X.length/3;
  for (int i = 0; i < iters; i++) {
    int at = 3*i;
    double x0 = X[at + 0];
    double x1 = X[at + 1];
    double x2 = X[at + 2];
    double y0 = Y[at + 0];
    double y1 = Y[at + 1];
    double y2 = Y[at + 2];
          double x1y2 = x1*y2;
          double x2y1 = x2*y1;
          double a = x1y2-x2y1;
          double x2y0 = x2*y0;
          double x0y2 = x0*y2;
          double b = x2y0-x0y2;
          double x0y1 = x0*y1;
          double x1y0 = x1*y0;
          double c = x0y1-x1y0;
    sum += a + b + c;
  }
return sum;

}

并且,例如 n=3 它会生成以下代码:

public static double eval3(double[] X, double[] Y) {
  double sum = 0.0;
  assert(X.length == Y.length);
  int iters = X.length/3;
  for (int i = 0; i < iters; i++) {
    int at = 3*i;
    double x0 = X[at + 0];
    double x1 = X[at + 1];
    double x2 = X[at + 2];
    double y0 = Y[at + 0];
    double y1 = Y[at + 1];
    double y2 = Y[at + 2];
          double x1y2_28 = x1*y2;
          double x1y2_29 = x1y2_28;
          double x1y2_30 = x1y2_29;
          double x1y2 = x1y2_30;
          double x2y1_31 = x2*y1;
          double x2y1_32 = x2y1_31;
          double x2y1_33 = x2y1_32;
          double x2y1 = x2y1_33;
          double a_34 = x1y2-x2y1;
          double a_35 = a_34;
          double a_36 = a_35;
          double a = a_36;
          double x2y0_37 = x2*y0;
          double x2y0_38 = x2y0_37;
          double x2y0_39 = x2y0_38;
          double x2y0 = x2y0_39;
          double x0y2_40 = x0*y2;
          double x0y2_41 = x0y2_40;
          double x0y2_42 = x0y2_41;
          double x0y2 = x0y2_42;
          double b_43 = x2y0-x0y2;
          double b_44 = b_43;
          double b_45 = b_44;
          double b = b_45;
          double x0y1_46 = x0*y1;
          double x0y1_47 = x0y1_46;
          double x0y1_48 = x0y1_47;
          double x0y1 = x0y1_48;
          double x1y0_49 = x1*y0;
          double x1y0_50 = x1y0_49;
          double x1y0_51 = x1y0_50;
          double x1y0 = x1y0_51;
          double c_52 = x0y1-x1y0;
          double c_53 = c_52;
          double c_54 = c_53;
          double c = c_54;
    sum += a + b + c;
  }
return sum;

}

这两个函数执行完全相同的计算,但有一个有更多的冗余分配。最后,我还生成了一个调度函数:

public double eval(int n, double[] X, double[] Y) {
  switch (n) {
    case 0: return eval0(X, Y);
    case 1: return eval1(X, Y);
    case 2: return eval2(X, Y);
    case 3: return eval3(X, Y);
    case 4: return eval4(X, Y);
    case 5: return eval5(X, Y);
    case 8: return eval8(X, Y);
    case 11: return eval11(X, Y);
    case 15: return eval15(X, Y);
    case 21: return eval21(X, Y);
    case 29: return eval29(X, Y);
    case 40: return eval40(X, Y);
    case 57: return eval57(X, Y);
    case 79: return eval79(X, Y);
    case 111: return eval111(X, Y);
    case 156: return eval156(X, Y);
    case 218: return eval218(X, Y);
    case 305: return eval305(X, Y);
  }
  assert(false);
  return -1;
}

所有生成的代码都在我的 repo here

然后我在大小为 10000 的 X 和 Y 数组上针对 n 的不同值对所有这些函数进行基准测试,其中填充了 运行dom 数据。我使用 Oracle JDK 9 javac 编译器和可嵌入的 Janino 编译器完成了这项工作。我的基准测试代码还让 JIT 预热了一点。 运行 基准生成此输出:

------ USING JAVAC
n = 0
"Elapsed time: 0.067189 msecs"
   Result= -9.434172113697462
n = 1
"Elapsed time: 0.05514 msecs"
   Result= -9.434172113697462
n = 2
"Elapsed time: 0.04627 msecs"
   Result= -9.434172113697462
n = 3
"Elapsed time: 0.041316 msecs"
   Result= -9.434172113697462
n = 4
"Elapsed time: 0.038673 msecs"
   Result= -9.434172113697462
n = 5
"Elapsed time: 0.036372 msecs"
   Result= -9.434172113697462
n = 8
"Elapsed time: 0.203788 msecs"
   Result= -9.434172113697462
n = 11
"Elapsed time: 0.031491 msecs"
   Result= -9.434172113697462
n = 15
"Elapsed time: 0.032673 msecs"
   Result= -9.434172113697462
n = 21
"Elapsed time: 0.030722 msecs"
   Result= -9.434172113697462
n = 29
"Elapsed time: 0.039271 msecs"
   Result= -9.434172113697462
n = 40
"Elapsed time: 0.030785 msecs"
   Result= -9.434172113697462
n = 57
"Elapsed time: 0.032382 msecs"
   Result= -9.434172113697462
n = 79
"Elapsed time: 0.033021 msecs"
   Result= -9.434172113697462
n = 111
"Elapsed time: 0.029978 msecs"
   Result= -9.434172113697462
n = 156
"Elapsed time: 18.003687 msecs"
   Result= -9.434172113697462
n = 218
"Elapsed time: 24.163828 msecs"
   Result= -9.434172113697462
n = 305
"Elapsed time: 33.479853 msecs"
   Result= -9.434172113697462
------ USING JANINO
n = 0
"Elapsed time: 0.032084 msecs"
   Result= -9.434172113697462
n = 1
"Elapsed time: 0.032022 msecs"
   Result= -9.434172113697462
n = 2
"Elapsed time: 0.029989 msecs"
   Result= -9.434172113697462
n = 3
"Elapsed time: 0.034251 msecs"
   Result= -9.434172113697462
n = 4
"Elapsed time: 0.030606 msecs"
   Result= -9.434172113697462
n = 5
"Elapsed time: 0.030186 msecs"
   Result= -9.434172113697462
n = 8
"Elapsed time: 0.032132 msecs"
   Result= -9.434172113697462
n = 11
"Elapsed time: 0.030109 msecs"
   Result= -9.434172113697462
n = 15
"Elapsed time: 0.031009 msecs"
   Result= -9.434172113697462
n = 21
"Elapsed time: 0.032625 msecs"
   Result= -9.434172113697462
n = 29
"Elapsed time: 0.031489 msecs"
   Result= -9.434172113697462
n = 40
"Elapsed time: 0.030665 msecs"
   Result= -9.434172113697462
n = 57
"Elapsed time: 0.03146 msecs"
   Result= -9.434172113697462
n = 79
"Elapsed time: 0.031599 msecs"
   Result= -9.434172113697462
n = 111
"Elapsed time: 0.029998 msecs"
   Result= -9.434172113697462
n = 156
"Elapsed time: 17.579771 msecs"
   Result= -9.434172113697462
n = 218
"Elapsed time: 24.561065 msecs"
   Result= -9.434172113697462
n = 305
"Elapsed time: 33.357928 msecs"
   Result= -9.434172113697462

从上面的输出来看,似乎 javac 和 Janino 都生成了性能差不多的代码,而对于 n 的低值,该值似乎并不重要。然而,在 n=156,我们观察到运行时间急剧增加。我不知道为什么会这样,但我怀疑这与 JVM 上限制的局部变量数量有关,因此 Java 编译器 (javac/Janino) 必须使用变通方法来克服这个限制。而且这些解决方法对于 JIT 来说更难优化(这是我所怀疑的,但也许有人可以阐明这一点......)。

问题2:多余的加0会不会影响性能?

我写了 class 来试验一下。 class 有两个静态方法,它们都执行完全相同的计算,除了对于 apply0,我们在计算数组索引时还添加 0:

public class Mul2d {
    public static double[] apply0(double angle, double[] X) {
        int n = X.length/2;
        double[] Y = new double[2*n];
        double cosv = Math.cos(angle);
        double sinv = Math.sin(angle);
        for (int i = 0; i < n; i++) {
            int at = 2*i;
            Y[at + 0] = cosv*X[at + 0] - sinv*X[at + 1];
            Y[at + 1] = sinv*X[at + 0] + cosv*X[at + 1];
        }
        return Y;
    }

    public static double[] apply(double angle, double[] X) {
        int n = X.length/2;
        double[] Y = new double[2*n];
        double cosv = Math.cos(angle);
        double sinv = Math.sin(angle);
        for (int i = 0; i < n; i++) {
            int at = 2*i;
            Y[at] = cosv*X[at] - sinv*X[at + 1];
            Y[at + 1] = sinv*X[at] + cosv*X[at + 1];
        }
        return Y;
    }
}

运行一个大数组的benchmark提示加不加0无所谓。这是基准测试的输出:

With adding '+ 0'
"Elapsed time: 0.247315 msecs"
"Elapsed time: 0.235471 msecs"
"Elapsed time: 0.240675 msecs"
"Elapsed time: 0.251799 msecs"
"Elapsed time: 0.267139 msecs"
"Elapsed time: 0.250735 msecs"
"Elapsed time: 0.251697 msecs"
"Elapsed time: 0.238652 msecs"
"Elapsed time: 0.24872 msecs"
"Elapsed time: 1.274368 msecs"
Without adding '+ 0'
"Elapsed time: 0.239371 msecs"
"Elapsed time: 0.233459 msecs"
"Elapsed time: 0.228619 msecs"
"Elapsed time: 0.389649 msecs"
"Elapsed time: 0.238742 msecs"
"Elapsed time: 0.23459 msecs"
"Elapsed time: 0.23452 msecs"
"Elapsed time: 0.241013 msecs"
"Elapsed time: 0.356035 msecs"
"Elapsed time: 0.260892 msecs"

运行时看起来几乎相同,任何差异似乎都淹没在噪音中。

结论: 关于问题 1,我无法观察到对这个特定玩具问题的性能有任何负面影响。

关于问题2,加不加+0似乎无所谓。除非 JIT 优化 +0,否则循环中的其他计算很可能会支配总时间,这意味着添加 +0 的任何额外小成本都会淹没在噪声中。