在范围内使用不正确(大于正确)的生命周期的引用是否可以?

Is it Ok to have a reference with incorrect (larger than correct) lifetime in scope?

如果 'a 大于引用值,引用 &'a T 会立即导致 UB(未定义行为)吗?或者只要它不超过类型 T?

的引用值,就可以拥有这样的引用吗?

作为比较:mem::transmute::<u8, bool>(2) 是直接 UB,即使您从未访问过返回值。如果您有一个值为 0 的引用,也是如此,因为引用必须始终有效。即使您永远不会访问它们。另一方面,在您尝试取消引用空指针之前,拥有 ptr::null() 不是问题。

考虑这段代码:

let x = '';
let r_correct: &char = &x;

{
    let r_incorrect: &'static char = unsafe { mem::transmute(r_correct) };
    println!("{}", r_incorrect);
}

在此代码中,有两次引用 x。两者都没有超过 x。但是r_incorrect的类型显然是个谎言,因为x不会长生不老。

此代码是否表现出明确定义的行为?我看到三个选项:

没有。只有当您在 x 超出范围后访问 r_incorrect 时,才会发生未定义的行为,而您在这里没有这样做。

Rust 中的生命周期注释由编译器检查,以确保您没有做任何会导致内存不安全的事情,但是——假设借用检查器是快乐的——它们没有影响关于生成的二进制文件或变量实际存在的时间。

在您的示例中,您向编译器声明 r_incorrect 的生命周期比实际长得多,但这没有问题,因为您只在其有效生命周期内访问它。

这样做的危险在于,未来对代码的更改 可能会尝试在其真实生命周期之外使用 r_incorrect。编译器无法阻止这种情况的发生,因为你已经坚持认为没关系。

引用存在错误的生命周期是合理的只要它不是悬空的(只要指向的值没有被释放)。

您必须确保在取消分配之前不存在对该值的引用。将 exist 引用到已解除分配的值是 UB, 即使您从未 read/write 它 。只是它存在的是即时UB。

来自参考文献behavior considered undefined:

Producing an invalid value, even in private fields and locals [..]:

  • A reference or Box that is dangling, unaligned, or points to an invalid value.

Rust 引用“只是经过生命周期检查的指针”,这是一个常见的误区。现实情况是它们 严格得多:它们必须在 它们存在的整个时间里都是有效的(非悬挂、对齐、指向有效值),即使你不 read/write 他们。与原始指针比较,原始指针只需要在您 read/write 时有效。

只要您确保它们没有悬空,引用“错误的”生命周期本身就不是 UB behavior considered undefined 中没有任何内容这么说。生命周期只是借用检查器强制引用有效的工具,使用 transmute 绕过它们并不是立即 UB,它只是意味着现在由您来确保所有引用都是有效的。

据我所知,没有官方资源明确说明 and/or 取消引用大于正确生命周期的引用是否会导致未定义的行为。但是,有多个资源讨论了 Rust 中的未定义行为和具有无限生命周期的解引用引用,这些资源暗示了这样做的定义性。

根据 Rustonomicon 导致未定义行为的事物

引自 the Rustonomicon, chapter "What Unsafe Can Do" 粗体突出显示 [方括号中的斜体文本] 在此和所有后续引述中是由我):

Unlike C, Undefined Behavior is pretty limited in scope in Rust. All the core language cares about is preventing the following things:

  • Dereferencing (using the * operator on) dangling or unaligned pointers (see below)
  • Breaking the pointer aliasing rules
  • Calling a function with the wrong call ABI or unwinding from a function with the wrong unwind ABI.
  • Causing a data race
  • Executing code compiled with target features that the current thread of execution does not support
  • Producing invalid values (either alone or as a field of a compound type such as enum/struct/array/tuple):
    • [lots of subitems that are irrelevant to reference lifetimes]

"Producing" a value happens any time a value is assigned, passed to a function/primitive operation or returned from a function/primitive operation.

A reference/pointer is "dangling" if it is null or not all of the bytes it points to are part of the same allocation (so in particular they all have to be part of some allocation). The span of bytes it points to is determined by the pointer value and the size of the pointee type. As a consequence, if the span is empty, "dangling" is the same as "non-null". Note that slices and strings point to their entire range, so it's important that the length metadata is never too large (in particular, allocations and therefore slices and strings cannot be bigger than isize::MAX bytes). If for some reason this is too cumbersome, consider using raw pointers.

That's it. That's all the causes of Undefined Behavior baked into Rust. Of course, unsafe functions and traits are free to declare arbitrary other constraints that a program must maintain to avoid Undefined Behavior.

只有前两个要点与指针 and/or 引用相关。

  • 第一点是关于取消引用悬挂和未对齐的指针。

    • 未对齐的指针:改变引用的生命周期不能改变它的对齐,所以这不是问题。
    • 悬挂指针:在正确的生命周期内,根据上面的定义,引用不能悬挂。因此,在正确的生命周期内,转换后的引用也不是悬挂的,可以在不导致未定义行为的情况下取消引用。
  • 第二点是关于Rust的指针别名规则。引自 the Rustonomicon, chapter "References"(在谈论指针别名规则时,也在上面的引文中 link 编辑):

    There are two kinds of reference:

    • Shared reference: &
    • Mutable reference: &mut

    Which obey the following rules:

    • A reference cannot outlive its referent
    • A mutable reference cannot be aliased

    That's it. That's the whole model references follow.

    Of course, we should probably define what aliased means.

    error[E0425]: cannot find value `aliased` in this scope
     --> <rust.rs>:2:20
      |
    2 |     println!("{}", aliased);
      |                    ^^^^^^^ not found in this scope
    
    error: aborting due to previous error
    

    Unfortunately, Rust hasn't actually defined its aliasing model.

    While we wait for the Rust devs to specify the semantics of their language, let's use the next section to discuss what aliasing is in general, and why it matters.

    • 第一点听起来对我们的情况确实不太好——“一个引用不能比它的引用长寿”。但是,引自 the Rustonomicon, chapter "Lifetimes", section "The area covered by a lifetime":

      The lifetime (sometimes called a borrow) is alive from the place it is created to its last use. The borrowed thing needs to outlive only borrows that are alive.

      因此,由于我们在正确的生命周期结束后不再使用引用,因此它不再存在。因此,引用对象也不需要在正确的生命周期结束后还活着。

    • 第二点仅与 mutable 引用有关——在您的示例中,使用了共享引用。无论哪种方式,只要在转换后的引用处于活动状态时不使用原始值,任何定义都不会发生指针别名(尽管,正如 Rustonomicon 所说,没有为 Rust 定义别名模型,因此语言律师这很难...)

结论——根据 Rustonomicon

导致未定义行为的事物

Rustonomicon 中触发未定义行为的事物的枚举不包含拥有和取消引用具有大于正确生命周期的引用只要在正确生命周期结束后未访问此引用.

根据 Rust 参考导致未定义行为的事情

Rustonomicon 并不是唯一讨论未定义行为的文档。引自 the Rust Reference, chapter "Behavior considered undefined":

Rust code is incorrect if it exhibits any of the behaviors in the following list. This includes code within unsafe blocks and unsafe functions. unsafe only means that avoiding undefined behavior is on the programmer; it does not change anything about the fact that Rust programs must never cause undefined behavior.

It is the programmer's responsibility when writing unsafe code to ensure that any safe code interacting with the unsafe code cannot trigger these behaviors. unsafe code that satisfies this property for any safe client is called sound; if unsafe code can be misused by safe code to exhibit undefined behavior, it is unsound.


⚠️ Warning: The following list is not exhaustive. There is no formal model of Rust's semantics for what is and is not allowed in unsafe code, so there may be more behavior considered unsafe. The following list is just what we know for sure is undefined behavior. Please read the Rustonomicon before writing unsafe code.


  • Data races.
  • Evaluating a dereference expression (*expr) on a raw pointer that is dangling or unaligned, even in place expression context (e.g. addr_of!(&*expr)).
  • Breaking the pointer aliasing rules. &mut T and &T follow LLVM’s scoped noalias model, except if the &T contains an UnsafeCell<U>.
  • Mutating immutable data. All data inside a const item is immutable. Moreover, all data reached through a shared reference or data owned by an immutable binding is immutable, unless that data is contained within an UnsafeCell<U>.
  • Invoking undefined behavior via compiler intrinsics.
  • Executing code compiled with platform features that the current platform does not support (see target_feature).
  • Calling a function with the wrong call ABI or unwinding from a function with the wrong unwind ABI.
  • Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. The following values are invalid (at their respective type):
    • [lots of subitems that are irrelevant to reference lifetimes]

Note: Uninitialized memory is also implicitly invalid for any type that has a restricted set of valid values. In other words, the only cases in which reading uninitialized memory is permitted are inside unions and in "padding" (the gaps between the fields/elements of a type).


Note: Undefined behavior affects the entire program. For example, calling a function in C that exhibits undefined behavior of C means your entire program contains undefined behaviour that can also affect the Rust code. And vice versa, undefined behavior in Rust can cause adverse affects on code executed by any FFI calls to other languages.

Dangling pointers

A reference/pointer is "dangling" if it is null or not all of the bytes it points to are part of the same allocation (so in particular they all have to be part of some allocation). The span of bytes it points to is determined by the pointer value and the size of the pointee type (using size_of_val). As a consequence, if the span is empty, "dangling" is the same as "non-null". Note that slices and strings point to their entire range, so it is important that the length metadata is never too large. In particular, allocations and therefore slices and strings cannot be bigger than isize::MAX bytes.

这个列表,包括 bold 中的两点,或多或少等同于 Rustonomicon 的列表(虽然不那么严格,因为第一个粗体项目符号点禁止取消引用悬挂原始指针,而不是取消引用悬挂引用——我想这是一个疏忽)。 LLVM 文档有一些有趣的 link,但最终结果是一样的:根据此列表,拥有一个大于正确生命周期的引用不会导致未定义的行为,只要在正确的生命周期结束后,引用不会被取消引用。但是,这里有一个附加说明:

⚠️ Warning: The following list is not exhaustive. There is no formal model of Rust's semantics for what is and is not allowed in unsafe code, so there may be more behavior considered unsafe. The following list is just what we know for sure is undefined behavior. Please read the [Rustonomicon] before writing unsafe code.

结论——根据 Rust 参考导致未定义行为的事物

Rust 参考没有详尽列举所有可能触发未定义行为的事物。虽然 Rust 参考没有明确说明大于正确生命周期 do 的引用会触发未定义的行为,但它也没有明确说明它们 不会 .

关于无限生命周期的 Rustonomicon

引自the Rustonomicon, chapter "Unbounded Lifetimes"

Unsafe code can often end up producing references or lifetimes out of thin air. Such lifetimes come into the world as unbounded. The most common source of this is dereferencing a raw pointer, which produces a reference with an unbounded lifetime. Such a lifetime becomes as big as context demands. This is in fact more powerful than simply becoming 'static, because for instance &'static &'a T will fail to typecheck, but the unbound lifetime will perfectly mold into &'a &'a T as needed. However for most intents and purposes, such an unbounded lifetime can be regarded as 'static.

Almost no reference is 'static, so this is probably wrong. transmute and transmute_copy are the two other primary offenders. One should endeavor to bound an unbounded lifetime as quickly as possible, especially across function boundaries.

Rustonomicon 说人们应该“努力尽快限制无限制的生命周期,尤其是跨越函数边界”。它没有表明解除对无界引用的引用——假设引用对象仍然存在——会导致未定义的行为。由于取消引用引用是一项常见操作,我无法想象 Rustonomicon 没有指出如此明显的问题。因此,我得出结论,只要引用对象仍然存在,解除对无界引用的引用就不会导致未定义的行为。

但是,问题不在于具有无限生命周期的引用,而是关于具有大于正确生命周期的引用,例如 &'static T。 Rustonomicon 指出“对于大多数意图和目的,[...] 无限生命期可被视为 'static”。这并不绝对意味着取消引用具有大于正确生命周期的引用与取消引用具有无限生命周期的引用一样是定义好的行为。但是,我不明白为什么 rustc 在这方面应该以不同的方式处理无限生命周期。如果是这样,我希望 Rustonomicon 包含它确实存在的注释,并且无限的生命周期仍然比错误绑定的生命周期更安全。

结论——Rustonomicon 关于无限生命期

取消引用 unbounded 生命周期可能 not 根据 Rustonomicon 未定义的行为。这可能会也可能不会扩展到 绑定的引用,但大于正确的 生命周期——在我看来,它确实扩展了。

transmute() 文档中的示例

引自标准库documentation on std::mem::transmute(),第二个例子:

Extending a lifetime [...]. This is advanced, very unsafe Rust!

struct R<'a>(&'a i32);
unsafe fn extend_lifetime<'b>(r: R<'b>) -> R<'static> {
    std::mem::transmute::<R<'b>, R<'static>>(r)
}

*[...]*

这是确凿的证据,因为您将获得至少 拥有 具有大于正确生命周期的引用不会导致未定义的行为 - 否则,任何调用此函数的人都可以使用任何非 'static 引用会立即调用未定义的行为,并且此函数将非常非常无用。此外,对我来说,这意味着您还可以取消引用 extend_lifetime 返回的引用——否则该函数有什么好处?

结论 – transmute() 文档中的示例

transmute() 文档中的示例似乎暗示取消引用具有大于正确生命周期的引用是明确定义的。

最终结论

遗憾的是,关于 Unsafe Rust 细节的文档仍然不完整,并且有很多关于像这样的边缘案例的问题,文档根本无法给出明确的答案。但是,所有与问题远程相关的文档似乎都暗示取消引用所讨论的引用实际上是明确定义的行为。这是否足以让你做这样的事情取决于你——它可能适合我。

但是,你真的不应该这样做

澄清一下:虽然这段代码可能定义得很好,但它绝对仍然是一把脚枪。跨函数边界传递这样的引用是一个坏主意,特别是如果函数是 pub 并且因此可以被其他 modules/crates 调用。即使不跨越函数边界传递它,仍然很容易误用这个引用,导致你的代码导致未定义的行为。如果您认为您可能需要这样做,我敦促您重新考虑是否可以重构您的代码以避免改变生命周期。例如,直接使用原始指针而不是生命周期不正确的引用可能更安全(或者至少更清楚这是完全和绝对不安全的)。