Java 最终字段:"taint" 当前 JLS 的行为是否可能

Java final fields: is "taint" behavior possible with the current JLS

我目前正在尝试理解 this JLS section on final fields

为了更好地理解 JLS 中的文本,我还阅读了 Jeremy Manson(JMM 的创建者之一)的 The Java Memory Model

本文包含让我感兴趣的示例:如果一个对象 o 具有 final 字段对另一个线程 t 可见两次:

然后 t 可以看到半构造的 o 即使仅通过“正确”发布的路径访问它也是如此。

这是论文的一部分:

Figure 7.3: Example of Simple Final Semantics

f1 is a final field; its default value is 0

Thread 1 Thread 2 Thread 3
o.f1 = 42;
p = o;
freeze o.f1;
q = o;

r1 = p;
i = r1.f1;
r2 = q;
if (r2 == r1)
    k = r2.f1;
r3 = q;
j = r3.f1;



We assume r1, r2 and r3 do not see the value null. i and k can be 0 or 42, and j must be 42.


Consider Figure 7.3. We will not start out with the complications of multiple writes to final fields; a freeze, for the moment, is simply what happens at the end of a constructor. Although r1, r2 and r3 can see the value null, we will not concern ourselves with that; that just leads to a null pointer exception.

...

What about the read of q.f1 in Thread 2? Is that guaranteed to see the correct value for the final field? A compiler could determine that p and q point to the same object, and therefore reuse the same value for both p.f1 and q.f1 for that thread. We want to allow the compiler to remove redundant reads of final fields wherever possible, so we allow k to see the value 0.

One way to conceptualize this is by thinking of an object being “tainted’ for a thread if that thread reads an incorrectly published reference to the object. If an object is tainted for a thread, the thread is never guaranteed to see the object’s correctly constructed final fields. More generally, if a thread t reads an incorrectly published reference to an object o, thread t forever sees a tainted version of o without any guarantees of seeing the correct value for the final fields of o.

我试图在 the current JLS 中找到任何明确允许或禁止此类行为的内容,但我只找到了:

An object is considered to be completely initialized when its constructor finishes. A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.

现在的JLS允许这样的行为吗?

可以,允许

主要暴露在JMM:

已经引用的部分

Assuming the object is constructed "correctly", once an object is constructed, the values assigned to the final fields in the constructor will be visible to all other threads without synchronization.

What does it mean for an object to be properly constructed? It simply means that no reference to the object being constructed is allowed to "escape" during construction.

In other words, do not place a reference to the object being constructed anywhere where another thread might be able to see it; do not assign it to a static field, do not register it as a listener with any other object, and so on. These tasks should be done after the constructor completes, not in the constructor** *

所以是的,在允许的范围内,这是可能的。最后一段充满了如何不做的事情的建议;每当有人说避免做 X 时,就暗示着 X 可以做到。


如果... reflection

其他答案正确指出了其他线程正确看到final字段的要求,例如构造函数末尾的冻结、链等。这些答案提供了对主要问题的更深入理解,应首先阅读。 本文着重于这些规则的可能例外情况。

重复次数最多的 rule/phrase 可能是这里的这个,复制自 Eugene 的回答(顺便说一句,它不应该有任何反对票):

An object is considered to be completely initialized when its constructor finishes. A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly [assigned/loaded/set] values for that object's final fields.

请注意,我将术语“初始化”更改为分配、加载或设置的等效术语。这是故意的,因为这里的术语可能会误导我的观点。

另一个正确的说法来自 chrylis -cautiouslyoptimistic-:

The "final freeze" happens at the end of the constructor, and from that point on all reads are guaranteed to be accurate.


JLS 17.5最终字段语义声明:

A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.

但是,您认为反射对此有影响吗?不,当然不。它甚至没有读过那一段。

final 字段的后续修改

这些说法不仅正确,而且得到了JLS的支持。我无意反驳他们,只是添加一些关于该法则例外的额外信息:reflection这种机制,除其他外,可以在初始化后更改最终字段的值

final 字段的冻结发生在设置了 final 字段的构造函数的末尾,这是完全正确的。但是还有另一个冻结操作的触发因素没有被考虑到: Freeze of a final field also occurs initializing/modifying a field via reflection (JLS 17.5.3):

Freezes of a final field occur both at the end of the constructor in which the final field is set, and immediately after each modification of a final field via reflection.

final 字段的反射操作“打破”了规则:构造函数正确完成后,final 字段的所有读取仍然不能保证准确。我会尽力解释。

让我们想象一下,所有正确的流程都已得到遵守,构造函数已被初始化,并且实例中的所有 final 字段都被线程正确看到。现在是时候通过反射对这些字段进行一些更改了(想象一下这是需要的,即使不寻常,我知道..)。

遵循前面的规则,所有线程等待所有字段更新:就像通常的构造函数场景一样,字段只有在被冻结并正确完成反射操作后才能访问。 这就是违法的地方:

If a final field is initialized to a constant expression (§15.28) in the field declaration, changes to the final field may not be observed, since uses of that final field are replaced at compile time with the value of the constant expression.

这说明:即使遵循了所有规则,如果该变量是 原始类型或字符串 [=194=,您的代码也不会正确读取 final 字段的赋值] 并在字段声明 中将其初始化为 常量表达式。为什么?因为该变量只是编译器的硬编码值,它永远不会再次检查该字段及其更改,即使您的代码在运行时执行中正确更新了该值。

那么,我们来测试一下:

 public class FinalGuarantee 
 {          
      private final int  i = 5;  //initialized as constant expression
      private final long l;

      public FinalGuarantee() 
      {
         l = 1L;
      }
        
      public static void touch(FinalGuarantee f) throws Exception
      {
         Class<FinalGuarantee> rfkClass = FinalGuarantee.class;
         Field field = rfkClass.getDeclaredField("i");
         field.setAccessible(true);
         field.set(f,555);                      //set i to 555
         field = rfkClass.getDeclaredField("l");
         field.setAccessible(true);
         field.set(f,111L);                     //set l to 111                 
      }
      
      public static void main(String[] args) throws Exception 
      {
         FinalGuarantee f = new FinalGuarantee();
         System.out.println(f.i);
         System.out.println(f.l);
         touch(f);
         System.out.println("-");
         System.out.println(f.i);
         System.out.println(f.l);
      }    
 }

输出:

 5
 1
 -
 5   
 111

最终的 int i 已在运行时正确更新,要检查它,您可以调试和检查对象的字段值:

il 都已正确更新。那么i是怎么回事,为什么还是显示5呢?因为如 JLS 中所述,字段 i 在编译时直接替换为常量表达式 值,在本例中为 5.

对最终字段 i 的每次后续读取都将是 INCORRECT,即使遵循了所有先前的规则。编译器将永远不会再次检查该字段:当您编写 f.i 代码时,它不会访问任何实例的任何变量。它只会 return 5: final 字段只是在编译时硬编码 如果在运行时对其进行更新,它将永远不会再次正确显示通过任何线程。 这违反了法律

作为在运行时正确更新字段的证明:

555111L 都被压入堆栈,字段获得新分配的值.但是在操作它们时会发生什么,比如打印它们的值?

  • l 未初始化为常量表达式,也未在字段声明中初始化。因此,不受 17.5.3 规则的影响。该字段已正确更新并从外线程读取。

  • 然而,
  • i 在字段声明中被初始化为常量表达式。初始冻结后,编译器不再有 f.i,该字段将永远不会被再次访问。即使变量在示例中正确更新为 555,每次尝试从该字段读取都已被硬编码常量 5 替换;无论对变量进行任何进一步的 change/update,它总是 return 五。

16: before the update
42: after the update

没有字段访问权限,只有一个“是的,肯定是 5,return 它”。这意味着即使遵循了所有协议,final 字段 也不能始终保证从外线程中被正确地看到

这会影响原语和字符串。我知道这是一种不寻常的情况,但它仍然是可能的。


其他一些有问题的场景(一些也与评论中引用的同步问题有关):

1- 如果反射操作不正确synchronized,线程可能会陷入竞争条件以下场景:

    final boolean flag;  // false in constructor
    final int x;         // 1 in constructor 
  • 假设反射操作将按以下顺序进行:
  1- Set flag to true
  2- Set x to 100.

reader线程代码的简化:

    while (!instance.flag)  //flag changes to true
       Thread.sleep(1);
    System.out.println(instance.x); // 1 or 100 ?

作为一种可能的情况,反射操作没有足够的时间来更新 x,因此 final int x 字段可能会或不会被正确读取。

2- 在以下情况下,线程可能会陷入 死锁

    final boolean flag;  // false in constructor
  • 假设反射操作将:
  1- Set flag to true

reader线程代码的简化:

    while (!instance.flag) { /*deadlocked here*/ } 

    /*flag changes to true, but the thread started to check too early.
     Compiler optimization could assume flag won't ever change
     so this thread won't ever see the updated value. */

我知道这不是 final 字段的特定问题,只是作为此类变量读取流程不正确的可能情况添加的。 后两种情况只是实施不正确的结果,但我想指出来。

此行为在 17.5 中是允许的:

compilers are allowed to keep the value of a final field cached in a register and not reload it from memory in situations where a non-final field would have to be reloaded

“最终冻结”发生在构造函数的末尾,从那时起所有读取都保证是准确的。但是,如果对象发布不安全,则另一个线程可以 (1) 读取未初始化的字段 o,并且 (2) also 假设因为 o 是最终的,它永远不会改变,因此永久缓存该值而无需重新读取它。

停止。引用。 JMM.

JMM 不适合我和你,它适合那些真正知道自己在做什么的人,比如 JVM 编译器编写者。你是其中之一吗?我是他们中的一员吗?我不这么认为,因此远离它。好了,我说了。

你自己回答这个问题很有趣,通过 JLS 中的正确引用:

An object is considered to be completely initialized when its constructor finishes. A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.

就是这样。它明确说明什么是 正确 以及什么是预期结果。其他一切都没有记录,因此未定义,因此“欢迎来到未知领域。祝你有美好的一天”。所以是的,只需排除不可能的事情(或由 JLS 保证)就可以实现。

编辑

走吧,这会很长。我们需要从JLS here:

看某条规则

Given a write w, a freeze f, an action a (that is not a read of a final field), a read r1 of the final field frozen by f, and a read r2 such that hb(w, f), hb(f, a), mc(a, r1), and dereferences(r1, r2), then when determining which values can be seen by r2, we consider hb(w, r2)

很多,但我们应该慢慢理解。我承认我还没有用 final 个字段做过这个练习。

我将从 Thread 1Thread 3 开始。很明显,Thread 1 中的所有这些操作形成了一个 happens-before 链,因为明显的“程序顺序”:

o.f1 = 42;
p = o;
freeze o.f1;
q = o;

所以我们有:

   (hb)                   (hb)
w ------> freeze, freeze ------> q

如果您查看上面的引述,我们会满足两个条件:hb(w, f)hb(f, a),即:我们通过 o.f1 = 42 进行写入 (w),通过 freeze o.f1 进行冻结,并且通过 [ 满足第二个条件 (hb(f, a)) =30=].

接下来我们需要建立的是mc(a, r1)。为此,我们需要涉及 Thread 3,它会:

r3 = q;
j = r3.f1;

因此,我们可以说“动作 a”(来自同一引述)是 writer1(来自 mc(a, r1))是阅读,通过r3 = q;。同一章说的是memory chain:

If r is a read that sees a write w, then it must be the case that mc(w, r).

完全符合我们上面的描述。因此,到目前为止我们有:

      (hb)                       (hb)
   w ------> freeze --> freeze ------> q --> mc(w, r1).

现在我们需要看看 dereferences(r1, r2)。我们再次翻到同一章:

Dereference Chain: If an action a is a read or write of a field or element of an object o by a thread t that did not initialize...

是否 Thread 3 初始化了 q?不(这很好)。如果你读了这句话的后半部分(至少在我的理解中),我们也已经履行了这条规则。因此:

      (hb)          (hb)     (mc)       (dereferences)
   w ------> freeze -----> a ------> r1 ----------------> r2

因此(根据相同的初始引述):

   hb(w, r2).

读作“不可能发生数据竞争”。所以 Thread 3 可以 读取的唯一东西是 42,因为读取要么看到最新的写入发生在顺序之前,要么 任何其他写.


如果您将此推断为 Thread 1Thread 2,您会立即发现缺少 freeze 操作 - 您甚至无法开始构建这样的链。因此:数据竞争,因此它可以读取任何其他值。但实际上它可以读取 042,因为 java 不允许“凭空”值。

是的,这样的行为是允许的。

原来在 personal page of William Pugh (yet another JMM author): New presentation/description of the semantics of final fields.

上可以找到对同一案例的详细解释

简短版本:

  • 部分 17.5.1. Semantics of final Fields of JLS 定义了最终字段的特殊规则。
    这些规则基本上让我们在构造函数中的 final 字段的初始化和另一个线程中的字段读取之间建立一个额外的先行发生关系,即使对象是通过数据竞争发布的。
    这种额外的先行发生关系要求 每个 从字段初始化到在另一个线程中读取的路径都包含一个特殊的操作链:

    w <s> ʰᵇ </s>► f <s> ʰᵇ </s>► a <s> ᵐᶜ </s>► r<sub>1</sub> <s> ᵈᶜ </s>► r<sub>2</sub>, where:
    • w 是对构造函数中的最终字段的写入
    • f 是“冻结动作”,当构造函数退出时发生
    • a 是对象的发布(例如将其保存到共享变量)
    • r₁是在不同线程中读取对象地址
    • r₂ 是在与 r₁.
    • 相同的线程中读取最终字段
  • 问题中的代码具有从 o.f1 = 42k = r2.f1; 的路径,其中不包括所需的 freeze o.f 操作:

    o.f1 = 42 <s> ʰᵇ </s>► { freeze o.f <i>is missing</i> } <s> ʰᵇ </s>► p = o <s> ᵐᶜ </s>► r1 = p <s> ᵈᶜ </s>► k = r2.f1
    

    因此,o.f1 = 42k = r2.f1 未按 happens-before 排序 ⇒ 我们有数据竞争,k = r2.f1 可以读取 0 或 42。

引自New presentation/description of the semantics of final fields:

In order to determine if a read of a final field is guaranteed to see the initialized value of that field, you must determine that there is no way to construct the partial orders  ᵐᶜ ► and  ᵈᶜ ► without providing the chain w  ʰᵇ f  ʰᵇ a  ᵐᶜ r₁  ᵈᶜ r₂ from the write of the field to the read of that field.

...

The write in Thread 1 and read in Thread 2 of p are involved in a memory chain. The write in Thread 1 and read in Thread 2 of q are also involved in a memory chain. Both reads of f see the same variable. There can be a dereference chain from the reads of f to either the read of p or the read of q, because those reads see the same address. If the dereference chain is from the read of p, then there is no guarantee that r5 will see the value 42.

Notice that for Thread 2, the deference chain orders r2 = p  ᵈᶜ r5 = r4.f, but does not order r4 = q  ᵈᶜ r5 = r4.f. This reflects the fact that the compiler is allowed to move any read of a final field of an object o to immediately after the the very first read of the address of o within that thread.