为什么 jdk 代码风格使用变量赋值并在同一行读取 - 例如。 (i=2) < 最大值

Question

我注意到在 jdk 源代码中，更具体地说，在集合框架中，优先在表达式中读取变量之前分配变量。这只是一个简单的偏好还是我不知道的更重要的事情？我能想到的一个原因是变量只用在这个表达式中。

由于我不习惯这种风格，所以很难阅读。代码非常紧凑。下面您可以看到取自 java.util.HashMap.getNode()

的示例

Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 && ...) {
   ...
}

Answer 1

正如评论中已经提到的那样：Doug Lea 是集合框架和并发包的主要作者之一，他倾向于进行一些对普通人来说可能看起来令人困惑（甚至违反直觉）的优化。

_{这里的一个"famous"例子是copying fields to local variables为了最小化字节码的大小，实际上也是用table字段和local tab 您引用的示例中的变量！}

对于非常简单的测试，访问是否为 "inlined" 似乎没有区别（参考生成的字节码大小）。因此，我尝试创建一个示例，该示例大致类似于您提到的 getNode 方法的结构：访问数组字段、长度检查、访问一个数组元素的字段...

testSeparate 方法将赋值和检查分开
testInlined 方法使用 assignment-in-if-style
testRepeated 方法（作为反例）重复执行每个访问

代码：

class Node
{
    int k;
    int j;
}

public class AssignAndUseTestComplex
{
    public static void main(String[] args)
    {
        AssignAndUseTestComplex t = new AssignAndUseTestComplex();
        t.testSeparate(1);
        t.testInlined(1);
        t.testRepeated(1);
    }

    private Node table[] = new Node[] { new Node() };

    int testSeparate(int value)
    {
        Node[] tab = table;
        if (tab != null)
        {
            int n = tab.length;
            if (n > 0)
            {
                Node first = tab[(n-1)];
                if (first != null)
                {
                    return first.k+first.j;
                }
            }
        } 
        return 0;
    }

    int testInlined(int value)
    {
        Node[] tab; Node first, e; int n;
        if ((tab = table) != null && (n = tab.length) > 0 && 
            (first = tab[(n - 1)]) != null) {
            return first.k+first.j;
        }
        return 0;
    }

    int testRepeated(int value)
    {
        if (table != null)
        {
            if (table.length > 0)
            {
                if (table[(table.length-1)] != null)
                {
                    return table[(table.length-1)].k+table[(table.length-1)].j;
                }
            }
        } 
        return 0;
    }

}

以及生成的字节码：testSeparate 方法使用 41 条指令:

  int testSeparate(int);
    Code:
       0: aload_0
       1: getfield      #15                 // Field table:[LWhosebug/Node;
       4: astore_2
       5: aload_2
       6: ifnull        40
       9: aload_2
      10: arraylength
      11: istore_3
      12: iload_3
      13: ifle          40
      16: aload_2
      17: iload_3
      18: iconst_1
      19: isub
      20: aaload
      21: astore        4
      23: aload         4
      25: ifnull        40
      28: aload         4
      30: getfield      #37                 // Field Whosebug/Node.k:I
      33: aload         4
      35: getfield      #41                 // Field Whosebug/Node.j:I
      38: iadd
      39: ireturn
      40: iconst_0
      41: ireturn

testInlined 方法确实有点小，有 39 条指令

  int testInlined(int);
    Code:
       0: aload_0
       1: getfield      #15                 // Field table:[LWhosebug/Node;
       4: dup
       5: astore_2
       6: ifnull        38
       9: aload_2
      10: arraylength
      11: dup
      12: istore        5
      14: ifle          38
      17: aload_2
      18: iload         5
      20: iconst_1
      21: isub
      22: aaload
      23: dup
      24: astore_3
      25: ifnull        38
      28: aload_3
      29: getfield      #37                 // Field Whosebug/Node.k:I
      32: aload_3
      33: getfield      #41                 // Field Whosebug/Node.j:I
      36: iadd
      37: ireturn
      38: iconst_0
      39: ireturn

最后，testRepeated 方法使用了惊人的 63 条指令

  int testRepeated(int);
    Code:
       0: aload_0
       1: getfield      #15                 // Field table:[LWhosebug/Node;
       4: ifnull        62
       7: aload_0
       8: getfield      #15                 // Field table:[LWhosebug/Node;
      11: arraylength
      12: ifle          62
      15: aload_0
      16: getfield      #15                 // Field table:[LWhosebug/Node;
      19: aload_0
      20: getfield      #15                 // Field table:[LWhosebug/Node;
      23: arraylength
      24: iconst_1
      25: isub
      26: aaload
      27: ifnull        62
      30: aload_0
      31: getfield      #15                 // Field table:[LWhosebug/Node;
      34: aload_0
      35: getfield      #15                 // Field table:[LWhosebug/Node;
      38: arraylength
      39: iconst_1
      40: isub
      41: aaload
      42: getfield      #37                 // Field Whosebug/Node.k:I
      45: aload_0
      46: getfield      #15                 // Field table:[LWhosebug/Node;
      49: aload_0
      50: getfield      #15                 // Field table:[LWhosebug/Node;
      53: arraylength
      54: iconst_1
      55: isub
      56: aaload
      57: getfield      #41                 // Field Whosebug/Node.j:I
      60: iadd
      61: ireturn
      62: iconst_0
      63: ireturn

所以看起来这种 "obscure" 编写查询和赋值的方式确实可以节省几个字节的字节码，并且（考虑到关于在局部变量中存储字段的链接答案中的理由）这可能是使用这种风格的原因。

但是...

在任何情况下：该方法执行几次后，JIT 将启动，生成的机器代码将 "nothing" 与原始字节码相关 - 我很确定所有三个版本最终都会被编译成相同的机器代码。

所以底线是：不要使用这种风格。相反，只有 write dumb code 易于阅读和维护。您会知道什么时候轮到您使用这些 "optimizations"。

EDIT: A short addendum...

我做了进一步的测试，比较了testSeparate和testInlined方法关于JIT生成的实际机器码。

我稍微修改了 main 方法，以防止不切实际的过度优化或 JIT 可能采取的其他捷径，但实际方法没有修改。

正如预期的那样：当使用热点反汇编 JVM 和 -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:+PrintAssembly 调用这些方法几千次时，这两种方法的实际机器代码是 相同的。

因此，JIT 再一次完成了它的工作，程序员可以专注于编写可读代码（无论那意味着什么）。

... and and a minor correction/clarification:

我没有测试第三种方法，testRepeated，因为它不等同于其他方法（因此，它可以不会产生相同的机器代码）。顺便说一句，这是在局部变量中存储字段的策略的另一个小优点：它提供了一种（非常有限，但有时很方便）形式的“线程safety”：确保数组的长度（如HashMap的getNode方法中的tab数组）在方法运行时不能改变执行。

为什么 jdk 代码风格使用变量赋值并在同一行读取 - 例如。 (i=2) < 最大值

Why jdk code style uses a variable assignment and read on the same line - eg. (i=2) < max

java

java-collections-api

但是...