结构中的 Rust 生命周期范围

Question

所以，我正在努力将我在 Python 中编写的字符串分词器移植到 Rust，并且我运行遇到了一个我似乎无法解决的问题生命周期和结构。

所以，流程基本上是：

获取文件数组
将每个文件转换为 Vec<String> 个标记
用户 a Counter 和 Unicase 从每个 vec
将该计数与其他一些数据一起保存在结构中
（未来）对结构集进行一些处理，以累积每个文件数据的总数据

struct Corpus<'a> {
    words: Counter<UniCase<&'a String>>,
    parts: Vec<CorpusPart<'a>>
}

pub struct CorpusPart<'a> {
    percent_of_total: f32,
    word_count: usize,
    words: Counter<UniCase<&'a String>>
}

fn process_file(entry: &DirEntry) -> CorpusPart {
    let mut contents = read_to_string(entry.path())
        .expect("Could not load contents.");

    let tokens = tokenize(&mut contents);
    let counted_words = collect(&tokens);

    CorpusPart {
        percent_of_total: 0.0,
        word_count: tokens.len(),
        words: counted_words
    }
}

pub fn tokenize(normalized: &mut String) -> Vec<String> {
    // snip ...
}

pub fn collect(results: &Vec<String>) -> Counter<UniCase<&'_ String>> {
    results.iter()
        .map(|w| UniCase::new(w))
        .collect::<Counter<_>>()
}

然而，当我尝试 return CorpusPart 时，它抱怨说它试图引用局部变量 tokens。 can/should 我怎么处理这个？我尝试添加生命周期注释，但无法弄清楚...

基本上，我不再需要 Vec<String>，但我确实需要其中的一些 String 作为柜台。

感谢任何帮助，谢谢！

Answer 1

这里的问题是您丢弃了 Vec<String>，但仍然引用其中的元素。如果您不再需要Vec<String>，但仍然需要其中的一些内容，则必须将所有权转移给其他东西。

我假设您希望 Corpus 和 CorpusPart 都指向相同的字符串，这样您就不会不必要地复制字符串。如果是这种情况，则 Corpus 或 CorpusPart 必须拥有该字符串，以便不拥有该字符串的一方引用另一个拥有的字符串。（听起来比实际更复杂）

我假设 CorpusPart 拥有字符串，而 Corpus 只是指向那些字符串

use std::fs::DirEntry;
use std::fs::read_to_string;

pub struct UniCase<a> {
    test: a
}

impl<a> UniCase<a> {
    fn new(item: a) -> UniCase<a> {
        UniCase {
            test: item
        }
    }
}

type Counter<a> = Vec<a>;

struct Corpus<'a> {
    words: Counter<UniCase<&'a String>>, // Will reference the strings in CorpusPart (I assume you implemented this elsewhere)
    parts: Vec<CorpusPart>
}

pub struct CorpusPart {
    percent_of_total: f32,
    word_count: usize,
    words: Counter<UniCase<String>> // Has ownership of the strings
}

fn process_file(entry: &DirEntry) -> CorpusPart {
    let mut contents = read_to_string(entry.path())
        .expect("Could not load contents.");

    let tokens = tokenize(&mut contents);
    let length = tokens.len(); // Cache the length, as tokens will no longer be valid once passed to collect
    let counted_words = collect(tokens);

    CorpusPart {
        percent_of_total: 0.0,
        word_count: length,
        words: counted_words
    }
}

pub fn tokenize(normalized: &mut String) -> Vec<String> {
    Vec::new()
}

pub fn collect(results: Vec<String>) -> Counter<UniCase<String>> {
    results.into_iter() // Use into_iter() to consume the Vec that is passed in, and take ownership of the internal items
        .map(|w| UniCase::new(w))
        .collect::<Counter<_>>()
}

我将 Counter<a> 别名为 Vec<a>，因为我不知道您使用的是什么计数器。

Playground

结构中的 Rust 生命周期范围

Rust lifetime scoping in structs

rust

lifetime-scoping