将 Rust 字符串拆分为其具有连续相同字符的子字符串的 Vec

Splitting a Rust string into a Vec of its substrings with contiguous identical characters

我有一个 string&str 的 ASCII 字符,我想将它分成一个 Vec 的所有具有连续相同字符的子字符串(例如,"aabbca" 会变成 ["aa","bb","c","a"]).

我可以构建一个迭代单个字符并逐渐构建字符串 Vec 的函数,但我觉得我会重新发明轮子。 有没有更惯用的方法来实现这个?

这是我为 &str 实施的直观(和当前)解决方案:

fn split_cont_chars(source:&str) -> Vec<String> {
    let mut answer: Vec<String> = Vec::new();

    let mut head_char = source.chars().next().unwrap();
    let mut counter: usize = 1;

    for c in source.chars().skip(1) {
        if c == head_char {
            counter += 1;
        }
        else {
            answer.push(head_char.to_string().repeat(counter));
            head_char = c;
            counter = 1;
        }
    }
    answer.push(head_char.to_string().repeat(counter));

    answer
}

这按预期工作,但它比解决此类迭代问题的一般 Rust 代码冗长得多。

您可以使用 itertools::Iterator::group_by 而不是 s.chars()(使用身份函数),但它本质上是低效的,因为当需要将字符收集到字符串中时,您将拥有为每个结果分配一个字符串。我认为将切片放入原始字符串的唯一方法是自己手动查找切片。 (如果 itertools 提供了一个 group_by_indices 函数,那么你可以用它来构造切片,但据我所知它没有。)

fn char_runs(s: &str) -> Vec<&str> {
    let mut slices = Vec::new();
    let mut it = s.char_indices();
    let (mut slice_start, mut prev_char) = match it.next() {
        Some(pair) => pair,
        None => return slices,
    };

    for (i, c) in it {
        if c != prev_char {
            slices.push(&s[slice_start..i]);
            slice_start = i;
            prev_char = c;
        }
    }

    slices.push(&s[slice_start..]);

    slices
}

fn main() {
    let strings = ["", "a", "aa", "aab", "aabb", "aabbca", "aabb"];
    for s in strings {
        println!("{:?} => {:?}", s, char_runs(s));
    }

    // "" => []
    // "a" => ["a"]
    // "aa" => ["aa"]
    // "aab" => ["aa", "b"]
    // "aabb" => ["aa", "bb"]
    // "aabbca" => ["aa", "bb", "c", "a"]
    // "aabb" => ["aa", "", "bb"]
}

似乎没有比原始解决方案更实用的翻译,但有一个更地道的翻译:

struct LetterSequence {
    char_type: char,
    len: usize
}

impl LetterSequence {
    fn new(a:char, b:usize) -> Self {
        LetterSequence{char_type:a, len:b}
    }
    fn to_string(&self) -> String {
        self.char_type.to_string().repeat(self.len)
    }
}

fn split_char_struct(source:&str) -> Vec<LetterSequence> {
    let mut answer: Vec<LetterSequence> = Vec::new();

    let mut seq_count: usize = 1;
    let mut head_char: char = source.chars().next().unwrap();

    for c in source.chars().skip(1) {
        if c == head_char {
            seq_count += 1;
        }
        else {
            answer.push(LetterSequence::new(head_char, seq_count));
            head_char = c;
            seq_count = 1;
        }
    }
    answer.push(LetterSequence::new(head_char, seq_count));

answer}

LetterSequence 结构的帮助下,我们避免了必须维护可能与起始 &str.

的总长度一样多的可变字符串