从逻辑上拆分借用以解决启用 NLL 的借用检查器的限制是否安全?

Is it safe to logically split a borrow to work around a limitation of the NLL-enabled borrow-checker?

以下代码涉及非常微妙的借用检查器回避。代码本身描述了推理。问题:

/// Insert a new data element at a given key.
pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
    // Safety: As indicated below, we would like to return val1 directly in the loop,
    // but rust will reject this, claiming a double borrow, and we instead use some
    // unsafe hacks to circumvent the borrow checker. To show this is safe, consider
    // two cases.
    // - If the return is exercised (we found an element and early out):
    //   - let 'b be 'a (the borrow of self),
    //   - and let 'c be empty
    // - Otherwise (we did not find an element, exit the loop and terminate normally):
    //   - let 'b be the duration of the loop,
    //   - and let 'c be from the end of the loop until the end of 'a
    // In either case, 'b and 'c are disjoint, so the "double borrow" is safe.
    // The borrow checker reasons that 'b has to be at least 'a because it is returned,
    // and therefore it overlaps with 'c, but these happen in mutually exclusive
    // situations.
    for (key1, val1) in & /* 'b */ mut *this {
        if key == *key1 {
            // return val1; // we would like to write this
            return unsafe { // safety, see above. We know we are in the first case, so 'b = 'a
                std::mem::transmute::<&/* 'b */ mut V, &/* 'a */ mut V>(val1)
            }
        }
    }
    let this = & /* 'c */ mut *this;
    this.push((key, val));
    &mut this.last_mut().unwrap().1
}

我更愿意这样写:

/// Insert a new data element at a given key.
pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
    for (key1, val1) in &mut *this {
        if key == *key1 {
            return val1;
        }
    }
    let this = &mut *this;
    this.push((key, val));
    &mut this.last_mut().unwrap().1
}

但失败了:

error[E0499]: cannot borrow `*this` as mutable more than once at a time
 --> src/lib.rs:8:16
  |
2 | pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
  |               -- lifetime `'a` defined here
3 |     for (key1, val1) in &mut *this {
  |                         ---------- first mutable borrow occurs here
4 |         if key == *key1 {
5 |             return val1;
  |                    ---- returning this value requires that `*this` is borrowed for `'a`
...
8 |     let this = &mut *this;
  |                ^^^^^^^^^^ second mutable borrow occurs here

安全的替代品

首先,这是我的建议。您可以迭代 Vec 一次以通过 position(|x| x == y) 获取目标值的索引。然后您可以匹配现在拥有的值并像以前一样继续。这应该与您以前的版本具有非常相似的性能(事实上,LLVM 甚至可能在使用发布模式构建时使其完全相同)。

/// Insert a new data element at a given key.
pub fn insert<K: Eq, V>(this: &mut Vec<(K, V)>, key: K, val: V) -> &mut V {
    match this.iter().position(|(key1, _)| &key == key1) {
        Some(idx) => &mut this[idx].1,
        None => {
            this.push((key, val));
            &mut this.last_mut().unwrap().1
        }
    }
}

Playground Link

错误说明

这里简要解释了为什么编译器会变得混乱。如果我先重写它以分离迭代器的创建,则更容易查看。我还在函数签名中添加了第二个生命周期,以减少限制并更容易显示错误。老实说,这感觉像是借用检查员的错误,但我知道它是怎么弄到那里的。

use std::slice::IterMut;

// Returns a reference of borrowed value 'a of lifetime 'b. Since this reference // may exist up to the end of 'a, we know that 'b <= 'a. 
pub fn insert<'a: 'b, 'b, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'b mut V {
    
    // The problem comes from trying to identify an appropriate lifetime for 'c.
    // While iterating, each item also more or less shares the lifetime 'c.
    let iterator: IterMut<'c, (K, V)> = this.into_iter();
    
    for (ref mut key1, ref mut val1) in iterator {
        if key == *key1 {
            // Since this is the returned value, it must have lifetime 'b to match
            // the function signature. But at the same time it must also live for 'c.
            // Therefore 'b <= 'c.
            return val1
        }
    }

    // So at this point the constraints we have so far are as follows:
    // 'b <= 'a
    // 'c <= 'a
    // 'b <= 'c
    // Therefore 'b <= 'c <= 'a

    // Due to the next line, 'c mandates the iterator is still alive making this the
    // second mutable borrow.
    this.push((key, val));

    // This lives for 'b, but since 'b <= 'c then 'c still exists
    &mut this.last_mut().unwrap().1
}

外卖

  • “这真的安全吗?”它使用unsafe吗?如果它使用 unsafe 那么它是不安全的。 Safe/unsafe 与它是否应该工作无关。仅仅因为 C 代码有效,并不意味着它是安全的。这是关于我们的代码是否存在人为错误的可能性,导致程序以编译器无法解释的方式运行。只有当我们在多种条件下尝试过它并且它毫无例外地按预期可靠地工作时,我们才会认为不安全的东西是安全的。那么“这真的安全吗?”更多的问题是您对这段代码有多信任。
  • “这是表示执行的不安全操作的推荐方式吗?我应该改用指针吗?”在不安全代码方面,我个人的偏好是你有权利现在就改变一生。使用指针只是通过使它隐含在指针取消引用中来隐藏 transmute。此外,它在方程式中添加了指针,这只会增加另一层复杂性。
  • “新的 Polonius 借用检查器能推理出这样的模式吗?” 不知道。也许对此问题有更多了解的人会发表评论回答这个问题。
  • 旁注: 尽量避免编写具有 fn foo<'a>(&'a A) -> &'a B 生命周期的函数。这可能更具限制性,因为它强制返回的生命周期与输入相同。隐式版本看起来更像 fn foo<'a: 'b, 'b>(&'a A) -> &'b B,只要求输入生命周期长于返回生命周期。

如相关问题所述,最简单的做法是使用索引代替,因为它不需要不安全的代码。我可能会这样写:

pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
    let idx = this
        .iter()
        .enumerate()
        .find_map(|(i, (k, _))| if key == *k { Some(i) } else { None });

    let idx = idx.unwrap_or_else(|| {
        this.push((key, val));
        this.len() - 1
    });

    &mut this[idx].1
}

您应该执行基准测试 以了解这是否由于某种原因不够快。只有在这种情况下,您才应该选择使用 unsafe 代码来获得最后一点速度。然后,您应该再次 进行基准测试,看看代码是否明显更快。

例如,您可以通过使用 slice::get_unchecked_mut 而不是 &mut this[idx].1 来获得加速,这是一种更容易合理化的不安全代码。

在我们的安全代码中使用索引的好处是它们直接转换为指针偏移逻辑。我们可以使用这个安全示例并对其进行最小修改以获得使用 unsafe 代码的版本:

pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
    // I copied this code from Stack Overflow without reading the surrounding
    // text which explained why this code is or is not safe.
    unsafe {
        let found = this
            .iter_mut()
            .find_map(|(k, v)| if key == *k { Some(v as *mut V) } else { None });

        match found {
            Some(v) => &mut *v,
            None => {
                this.push((key, val));
                &mut this.last_mut().unwrap().1
            }
        }
    }
}

安全的要点围绕着指向found中值的指针。它开始时是一个可变引用,所以我们知道它在迭代时是有效的。我们知道 find_map 在第一个 Some 上停止迭代,并且我们知道使用 iter_mut() 进行迭代不应该改变我们的值。由于this不能在found的绑定和match中的使用之间改变,我相信这段代码是安全的。

通过 Miri 练习代码总是很有价值的。您实际上必须 执行 代码,因为 Miri 只会标记导致未定义行为的代码,而忽略任何休眠代码路径。此代码是 Miri-clean:

fn main() {
    let mut things = vec![(1, 2), (3, 4)];

    let v = insert(&mut things, 1, 2);
    println!("{} ({:p})", v, v);

    let v = insert(&mut things, 1, 2);
    println!("{} ({:p})", v, v);

    let v = insert(&mut things, 5, 6);
    println!("{} ({:p})", v, v);

    let v = insert(&mut things, 5, 6);
    println!("{} ({:p})", v, v);
}
2 (0x2829c)
2 (0x2829c)
6 (0x41054)
6 (0x41054)

Is [the original implementation] actually safe?

Miri 报告我在上面使用的相同测试代码没有问题,我也没有发现任何明显的错误。

Is this the recommended way to express the unsafe operations performed? Should I use pointers instead?

如果可以避免 mem::transmute通常 应该避免。 transmute 是 The Big Hammer,可以做很多您可能不打算做的事情(更改 types 是关键)。在这种情况下使用指针感觉简单多了。

我同意使用注释来说明为什么不安全代码是安全的。即使是错误的,它仍然表明了原作者的心态。未来的审阅者可能会说“啊,他们没有考虑清单项目 #42,让我测试一下!”。

特别是您评论中的推理,对我来说 过于密集/学术化。我不明白为什么要谈论多次生命周期或双重借用。

Will the new Polonius borrow checker be able to reason about patterns like this?

是:

% cargo +nightly rustc --
   Compiling example v0.1.0 (/private/tmp/example)
error[E0499]: cannot borrow `*this` as mutable more than once at a time
 --> src/main.rs:8:16
  |
2 | pub fn insert<'a, K: Eq, V>(this: &'a mut Vec<(K, V)>, key: K, val: V) -> &'a mut V {
  |               -- lifetime `'a` defined here
3 |     for (key1, val1) in &mut *this {
  |                         ---------- first mutable borrow occurs here
4 |         if key == *key1 {
5 |             return val1;
  |                    ---- returning this value requires that `*this` is borrowed for `'a`
...
8 |     let this = &mut *this;
  |                ^^^^^^^^^^ second mutable borrow occurs here

% cargo +nightly rustc -- -Zpolonius
   Compiling example v0.1.0 (/private/tmp/example)
    Finished dev [unoptimized + debuginfo] target(s) in 0.86s

% ./target/debug/example
2 (0x7f97ea405b24)
2 (0x7f97ea405b24)
6 (0x7f97ea405ba4)
6 (0x7f97ea405ba4)

另请参阅: