如何解决循环依赖和迭代器生命周期问题?

how to solve cyclic-dependency & Iterator lifetime problem?

Rustaceans。当我开始用 Rust 编写 BloomFilter 示例时。我发现我有几个问题需要解决。我努力解决它们,但一天没有进展。我需要帮助,任何建议都会对我有很大帮助,谢谢。

问题

  1. 迭代器传给另一个函数时如何解决生命周期问题?
// let bits = self.hash(value); // how to solve such lifetime error without use 'static storage?
 
// Below is a workaround code but need to computed in advanced.
let bits = Box::new(self.hash(value).collect::<Vec<u64>>().into_iter());
self.0.set(bits); 
  1. 如何在不修改低层代码的情况下解决struts之间的循环依赖,例如:bloom_filter ?
// cyclic-dependency:
// RedisCache -> BloomFilter -> Storage
//    |                            ^
//    ------------<impl>------------
//
//                                     v--- cache ownership has moved here
let filter = BloomFilter::by(Box::new(cache));
cache.1.replace(filter);
  1. 由于rust没有null值,如何在没有任何存根的情况下解决循环依赖初始化?
let mut cache = RedisCache(
    Client::open("redis://localhost").unwrap(),
    // I found can use Weak::new() to solve it,but need to downgrade a Rc reference.
    //                    v-- need a BloomFilter stub to create RedisCache
    RefCell::new(BloomFilter::new()),
);

代码

#![allow(unused)]

mod bloom_filter {
    use std::{hash::Hash, marker::PhantomData};

    pub type BitsIter = Box<dyn Iterator<Item = u64>>;

    pub trait Storage {
        fn set(&mut self, bits: BitsIter);

        fn contains_all(&self, bits: BitsIter) -> bool;
    }

    pub struct BloomFilter<T: Hash>(Box<dyn Storage>, PhantomData<T>);

    impl<T: Hash> BloomFilter<T> {
        pub fn new() -> BloomFilter<T> {
            return Self::by(Box::new(ArrayStorage([0; 5000])));

            struct ArrayStorage<const N: usize>([u8; N]);

            impl<const N: usize> Storage for ArrayStorage<N> {
                fn set(&mut self, bits: BitsIter) {
                    let size = self.0.len() as u64;
                    bits.map(|bit| (bit % size) as usize)
                        .for_each(|index| self.0[index] = 1);
                }

                fn contains_all(&self, bits: BitsIter) -> bool {
                    let size = self.0.len() as u64;
                    bits.map(|bit| (bit % size) as usize)
                        .all(|index| self.0[index] == 1)
                }
            }
        }

        pub fn by(storage: Box<dyn Storage>) -> BloomFilter<T> {
            BloomFilter(storage, PhantomData)
        }

        pub fn add(&mut self, value: T) {
            // let bits = self.hash(value); // how to solve such lifetime error?
            let bits = Box::new(self.hash(value).collect::<Vec<u64>>().into_iter());
            self.0.set(bits);
        }

        pub fn contains(&self, value: T) -> bool {
            // lifetime problem same as Self::add(T)
            let bits = Box::new(self.hash(value).collect::<Vec<u64>>().into_iter());
            self.0.contains_all(bits)
        }

        fn hash<'a, H: Hash + 'a>(&self, _value: H) -> Box<dyn Iterator<Item = u64> + 'a> {
            todo!()
        }
    }
}

mod spi {
    use super::bloom_filter::*;
    use redis::{Client, Commands, RedisResult};
    use std::{
        cell::RefCell,
        rc::{Rc, Weak},
    };

    pub struct RedisCache<'a>(Client, RefCell<BloomFilter<&'a str>>);

    impl<'a> RedisCache<'a> {
        pub fn new() -> RedisCache<'a> {
            let mut cache = RedisCache(
                Client::open("redis://localhost").unwrap(),
                //                    v-- need a BloomFilter stub to create RedisCache
                RefCell::new(BloomFilter::new()),
            );
            //                                      v--- cache ownership has moved here
            let filter = BloomFilter::by(Box::new(cache));
            cache.1.replace(filter);
            return cache;
        }

        pub fn get(&mut self, key: &str, load_value: fn() -> Option<String>) -> Option<String> {
            let filter = self.1.borrow();
            if filter.contains(key) {
                if let Ok(value) = self.0.get::<&str, String>(key) {
                    return Some(value);
                }
                if let Some(actual_value) = load_value() {
                    let _: () = self.0.set(key, &actual_value).unwrap();
                    return Some(actual_value);
                }
            }
            return None;
        }
    }

    impl<'a> Storage for RedisCache<'a> {
        fn set(&mut self, bits: BitsIter) {
            todo!()
        }

        fn contains_all(&self, bits: BitsIter) -> bool {
            todo!()
        }
    }
}


已更新


首先,感谢@Colonel Thirty Two给我提供了很多我没有掌握的信息,帮助我解决了迭代器生命周期的问题。 我通过在不修改 bloom_filter 模块的情况下将 Storage 的责任分解为另一个结构 RedisStorage 来解决循环依赖,但使示例变得臃肿。以下是他们的关系:

RedisCache ->  BloomFilter  -> Storage  <---------------
    |                                                  |
    |-------> redis::Client <- RedisStorage ---<impl>---          

我意识到所有权和生命周期系统不仅被借用检查器使用,而且 Rustaceans 需要比 GC 语言更大的前端设计来遵守规则,例如:java。我说得对吗?

最终代码

mod bloom_filter {
    use std::{
        hash::{Hash, Hasher},
        marker::PhantomData,
    };

    pub type BitsIter<'a> = Box<dyn Iterator<Item = u64> + 'a>;

    pub trait Storage {
        fn set(&mut self, bits: BitsIter);

        fn contains_all(&self, bits: BitsIter) -> bool;
    }

    pub struct BloomFilter<T: Hash>(Box<dyn Storage>, PhantomData<T>);

    impl<T: Hash> BloomFilter<T> {
        #[allow(unused)]
        pub fn new() -> BloomFilter<T> {
            return Self::by(Box::new(ArrayStorage([0; 5000])));

            struct ArrayStorage<const N: usize>([u8; N]);

            impl<const N: usize> Storage for ArrayStorage<N> {
                fn set(&mut self, bits: BitsIter) {
                    let size = self.0.len() as u64;
                    bits.map(|bit| (bit % size) as usize)
                        .for_each(|index| self.0[index] = 1);
                }

                fn contains_all(&self, bits: BitsIter) -> bool {
                    let size = self.0.len() as u64;
                    bits.map(|bit| (bit % size) as usize)
                        .all(|index| self.0[index] == 1)
                }
            }
        }

        pub fn by(storage: Box<dyn Storage>) -> BloomFilter<T> {
            BloomFilter(storage, PhantomData)
        }

        pub fn add(&mut self, value: T) {
            self.0.set(self.hash(value));
        }

        pub fn contains(&self, value: T) -> bool {
            self.0.contains_all(self.hash(value))
        }

        fn hash<'a, H: Hash + 'a>(&self, value: H) -> BitsIter<'a> {
            Box::new(
                [3, 11, 31, 71, 131]
                    .into_iter()
                    .map(|salt| SimpleHasher(0, salt))
                    .map(move |mut hasher| hasher.hash(&value)),
            )
        }
    }
    struct SimpleHasher(u64, u64);

    impl SimpleHasher {
        fn hash<H: Hash>(&mut self, value: &H) -> u64 {
            value.hash(self);
            self.finish()
        }
    }

    impl Hasher for SimpleHasher {
        fn finish(&self) -> u64 {
            self.0
        }

        fn write(&mut self, bytes: &[u8]) {
            self.0 += bytes.iter().fold(0u64, |acc, k| acc * self.1 + *k as u64)
        }
    }
}

mod spi {
    use super::bloom_filter::*;
    use redis::{Client, Commands};
    use std::{cell::RefCell, rc::Rc};

    pub struct RedisCache<'a>(Rc<RefCell<Client>>, BloomFilter<&'a str>);

    impl<'a> RedisCache<'a> {
        pub fn new(client: Rc<RefCell<Client>>, filter: BloomFilter<&'a str>) -> RedisCache<'a> {
            RedisCache(client, filter)
        }

        pub fn get<'f>(
            &mut self,
            key: &str,
            load_value: fn() -> Option<&'f str>,
        ) -> Option<String> {
            if self.1.contains(key) {
                let mut redis = self.0.as_ref().borrow_mut();
                if let Ok(value) = redis.get::<&str, String>(key) {
                    return Some(value);
                }
                if let Some(actual_value) = load_value() {
                    let _: () = redis.set(key, &actual_value).unwrap();
                    return Some(actual_value.into());
                }
            }
            return None;
        }
    }

    struct RedisStorage(Rc<RefCell<Client>>);

    const BLOOM_FILTER_KEY: &str = "bloom_filter";
    impl Storage for RedisStorage {
        fn set(&mut self, bits: BitsIter) {
            bits.for_each(|slot| {
                let _: bool = self
                    .0
                    .as_ref()
                    .borrow_mut()
                    .setbit(BLOOM_FILTER_KEY, slot as usize, true)
                    .unwrap();
            })
        }

        fn contains_all(&self, mut bits: BitsIter) -> bool {
            bits.all(|slot| {
                self.0
                    .as_ref()
                    .borrow_mut()
                    .getbit(BLOOM_FILTER_KEY, slot as usize)
                    .unwrap()
            })
        }
    }

    #[test]
    fn prevent_cache_penetration_by_bloom_filter() {
        let client = Rc::new(RefCell::new(Client::open("redis://localhost").unwrap()));
        redis::cmd("FLUSHDB").execute(&mut *client.as_ref().borrow_mut());
        let mut filter: BloomFilter<&str> = BloomFilter::by(Box::new(RedisStorage(client.clone())));

        assert!(!filter.contains("Rust"));

        filter.add("Rust");
        assert!(filter.contains("Rust"));

        let mut cache = RedisCache::new(client, filter);

        assert_eq!(
            cache.get("Rust", || Some("System Language")),
            Some("System Language".to_string())
        );
        assert_eq!(
            cache.get("Rust", || panic!("must never be called after cached")),
            Some("System Language".to_string())
        );
        assert_eq!(
            cache.get("Go", || panic!("reject to loading `Go` from external storage")),
            None
        );
    }
}

pub type BitsIter = Box<dyn Iterator<Item = u64>>;

在这种情况下,框中的对象必须在 'static 生命周期内有效。 hash 返回的迭代器不是这种情况 - 它仅限于 self.

的生命周期

尝试替换为:

pub type BitsIter<'a> = Box<dyn Iterator<Item = u64> + 'a>;

或者使用泛型代替盒装特征对象。


所以你的 RedisClient 需要一个 BloomFilter,但是 BloomFilter 也需要 RedisClient?

你的 BloomFilter 不应该使用本身使用 BloomFilterRedisCache - 这是无限递归调用的秘诀(你怎么知道什么调用 RedisCache::add应该更新布隆过滤器以及哪些调用来自布隆过滤器?)。

如果确实需要,则需要某种形式的共享所有权,例如 RcArc。您的 BloomFilter 也需要使用弱引用,否则这两个对象将相互引用并且永远不会释放。