内部可变性如何影响缓存行为?

How does interior mutability work for caching behavior?

我正在尝试创建一个 struct 接受 Path 并根据需要从指定路径加载图像。这是我目前所拥有的:

extern crate image;

use std::cell::{RefCell};
use std::path::{Path};
use image::{DynamicImage};

pub struct ImageCell<'a> {
    image: RefCell<Option<DynamicImage>>,
    image_path: &'a Path, 
}

impl<'a> ImageCell<'a> {
    pub fn new<P: AsRef<Path>>(image_path: &'a P) -> ImageCell<'a>{
        ImageCell { image: RefCell::new(None), image_path: image_path.as_ref() }
    }

    //copied from https://doc.rust-lang.org/nightly/std/cell/index.html#implementation-details-of-logically-immutable-methods
    pub fn get_image(&self) -> &DynamicImage {
        {
            let mut cache = self.image.borrow_mut();
            if cache.is_some() {
                return cache.as_ref().unwrap(); //Error here
            }

            let image = image::open(self.image_path).unwrap();
            *cache = Some(image);
        }

        self.get_image()
    } 
}

编译失败:

src/image_generation.rs:34:24: 34:29 error: `cache` does not live long enough
src/image_generation.rs:34                 return cache.as_ref().unwrap();
                                                  ^~~~~
src/image_generation.rs:30:46: 42:6 note: reference must be valid for the anonymous lifetime #1 defined on the block at 30:45...
src/image_generation.rs:30     pub fn get_image(&self) -> &DynamicImage {
src/image_generation.rs:31         {
src/image_generation.rs:32             let mut cache = self.image.borrow_mut();
src/image_generation.rs:33             if cache.is_some() {
src/image_generation.rs:34                 return cache.as_ref().unwrap();
src/image_generation.rs:35             }
                           ...
src/image_generation.rs:32:53: 39:10 note: ...but borrowed value is only valid for the block suffix following statement 0 at 32:52
src/image_generation.rs:32             let mut cache = self.image.borrow_mut();
src/image_generation.rs:33             if cache.is_some() {
src/image_generation.rs:34                 return cache.as_ref().unwrap();
src/image_generation.rs:35             }
src/image_generation.rs:36 
src/image_generation.rs:37             let image = image::open(self.image_path).unwrap();
                           ...

我想我明白为什么 cache 的生命周期与 borrow_mut() 相关联。

有没有办法构建代码以使其有效?

我不完全相信你需要这里的内部可变性。不过,我确实认为您提出的解决方案通常有用,因此我将详细说明一种实现方法。

您当前代码的问题是 RefCell 提供了 动态 借用语义。换句话说,借用 RefCell 的内容对 Rust 的借用检查器来说是不透明的。问题是,当您尝试 return a &DynamicImage 而它仍然存在于 RefCell 中时,RefCell 无法跟踪其借用状态。如果 RefCell 允许这种情况发生,那么在从 &DynamicImage 中借出时,其他代码可以覆盖 RefCell 的内容。哎呀!内存安全违规。

因此,从 RefCell 中借用一个值与调用 borrow_mut() 时取回的守卫的生命周期相关。在这种情况下,守卫的生命周期是get_image的堆栈帧,它在函数returns之后不再存在。因此,您不能像现在这样借用 RefCell 的内容。

另一种方法(同时保持内部可变性的要求)是值移入和移出RefCell。这使您能够保留缓存语义。

基本思想是 return 一个包含动态图像的 guard 以及指向其来源单元格的指针。完成动态图像后,防护将被删除,我们可以将图像添加回单元格的缓存。

为了保持人体工程学,我们在防护装置上安装了 Deref,这样您就可以假装它是 DynamicImage。这是带有一些注释和一些其他清理内容的代码:

use std::cell::RefCell;
use std::io;
use std::mem;
use std::ops::Deref;
use std::path::{Path, PathBuf};

struct ImageCell {
    image: RefCell<Option<DynamicImage>>,
    // Suffer the one time allocation into a `PathBuf` to avoid dealing
    // with the lifetime.
    image_path: PathBuf,
}

impl ImageCell {
    fn new<P: Into<PathBuf>>(image_path: P) -> ImageCell {
        ImageCell {
            image: RefCell::new(None),
            image_path: image_path.into(),
        }
    }

    fn get_image(&self) -> io::Result<DynamicImageGuard> {
        // `take` transfers ownership out from the `Option` inside the
        // `RefCell`. If there was no value there, then generate an image
        // and return it. Otherwise, move the value out of the `RefCell`
        // and return it.
        let image = match self.image.borrow_mut().take() {
            None => {
                println!("Opening new image: {:?}", self.image_path);
                try!(DynamicImage::open(&self.image_path))
            }
            Some(img) => {
                println!("Retrieving image from cache: {:?}", self.image_path);
                img
            }
        };
        // The guard provides the `DynamicImage` and a pointer back to
        // `ImageCell`. When it's dropped, the `DynamicImage` is added
        // back to the cache automatically.
        Ok(DynamicImageGuard { image_cell: self, image: image })
    }
}

struct DynamicImageGuard<'a> {
    image_cell: &'a ImageCell,
    image: DynamicImage,
}

impl<'a> Drop for DynamicImageGuard<'a> {
    fn drop(&mut self) {
        // When a `DynamicImageGuard` goes out of scope, this method is
        // called. We move the `DynamicImage` out of its current location
        // and put it back into the `RefCell` cache.
        println!("Adding image to cache: {:?}", self.image_cell.image_path);
        let image = mem::replace(&mut self.image, DynamicImage::empty());
        *self.image_cell.image.borrow_mut() = Some(image);
    }
}

impl<'a> Deref for DynamicImageGuard<'a> {
    type Target = DynamicImage;

    fn deref(&self) -> &DynamicImage {
        // This increases the ergnomics of a `DynamicImageGuard`. Because
        // of this impl, most uses of `DynamicImageGuard` can be as if
        // it were just a `&DynamicImage`.
        &self.image
    }
}

// A dummy image type.
struct DynamicImage {
    data: Vec<u8>,
}

// Dummy image methods.
impl DynamicImage {
    fn open<P: AsRef<Path>>(_p: P) -> io::Result<DynamicImage> {
        // Open image on file system here.
        Ok(DynamicImage { data: vec![] })
    }

    fn empty() -> DynamicImage {
        DynamicImage { data: vec![] }
    }
}

fn main() {
    let cell = ImageCell::new("foo");
    {
        let img = cell.get_image().unwrap(); // opens new image
        println!("image data: {:?}", img.data);
    } // adds image to cache (on drop of `img`)
    let img = cell.get_image().unwrap(); // retrieves image from cache
    println!("image data: {:?}", img.data);
} // adds image back to cache (on drop of `img`)

这里有一个非常重要的注意事项:这只有一个缓存位置,这意味着如果您在第一个守卫被删除之前第二次调用 get_image,则会生成一个新图像从头开始,因为单元格将为空。这种语义很难更改(在安全代码中),因为您已经致力于使用内部可变性的解决方案。一般来说,内部可变性的全部要点是在调用者无法观察到的情况下改变某些东西。事实上,应该是这种情况,假设打开图像总是return完全相同的数据。

这种方法可以推广为线程安全的(通过使用 Mutex 来实现内部可变性而不是 RefCell),并且根据您的用例选择不同的缓存策略可能会提高性能。例如,regex crate uses a simple memory pool to cache compiled regex state。由于这种缓存对调用者来说应该是不透明的,因此它是使用与此处概述的完全相同的机制以内部可变性实现的。