将 Rust 字符串拆分为其具有连续相同字符的子字符串的 Vec
Splitting a Rust string into a Vec of its substrings with contiguous identical characters
我有一个 string
或 &str
的 ASCII 字符,我想将它分成一个 Vec
的所有具有连续相同字符的子字符串(例如,"aabbca"
会变成 ["aa","bb","c","a"]
).
我可以构建一个迭代单个字符并逐渐构建字符串 Vec 的函数,但我觉得我会重新发明轮子。
有没有更惯用的方法来实现这个?
这是我为 &str
实施的直观(和当前)解决方案:
fn split_cont_chars(source:&str) -> Vec<String> {
let mut answer: Vec<String> = Vec::new();
let mut head_char = source.chars().next().unwrap();
let mut counter: usize = 1;
for c in source.chars().skip(1) {
if c == head_char {
counter += 1;
}
else {
answer.push(head_char.to_string().repeat(counter));
head_char = c;
counter = 1;
}
}
answer.push(head_char.to_string().repeat(counter));
answer
}
这按预期工作,但它比解决此类迭代问题的一般 Rust 代码冗长得多。
您可以使用 itertools::Iterator::group_by
而不是 s.chars()
(使用身份函数),但它本质上是低效的,因为当需要将字符收集到字符串中时,您将拥有为每个结果分配一个字符串。我认为将切片放入原始字符串的唯一方法是自己手动查找切片。 (如果 itertools
提供了一个 group_by_indices
函数,那么你可以用它来构造切片,但据我所知它没有。)
fn char_runs(s: &str) -> Vec<&str> {
let mut slices = Vec::new();
let mut it = s.char_indices();
let (mut slice_start, mut prev_char) = match it.next() {
Some(pair) => pair,
None => return slices,
};
for (i, c) in it {
if c != prev_char {
slices.push(&s[slice_start..i]);
slice_start = i;
prev_char = c;
}
}
slices.push(&s[slice_start..]);
slices
}
fn main() {
let strings = ["", "a", "aa", "aab", "aabb", "aabbca", "aabb"];
for s in strings {
println!("{:?} => {:?}", s, char_runs(s));
}
// "" => []
// "a" => ["a"]
// "aa" => ["aa"]
// "aab" => ["aa", "b"]
// "aabb" => ["aa", "bb"]
// "aabbca" => ["aa", "bb", "c", "a"]
// "aabb" => ["aa", "", "bb"]
}
似乎没有比原始解决方案更实用的翻译,但有一个更地道的翻译:
struct LetterSequence {
char_type: char,
len: usize
}
impl LetterSequence {
fn new(a:char, b:usize) -> Self {
LetterSequence{char_type:a, len:b}
}
fn to_string(&self) -> String {
self.char_type.to_string().repeat(self.len)
}
}
fn split_char_struct(source:&str) -> Vec<LetterSequence> {
let mut answer: Vec<LetterSequence> = Vec::new();
let mut seq_count: usize = 1;
let mut head_char: char = source.chars().next().unwrap();
for c in source.chars().skip(1) {
if c == head_char {
seq_count += 1;
}
else {
answer.push(LetterSequence::new(head_char, seq_count));
head_char = c;
seq_count = 1;
}
}
answer.push(LetterSequence::new(head_char, seq_count));
answer}
在 LetterSequence
结构的帮助下,我们避免了必须维护可能与起始 &str
.
的总长度一样多的可变字符串
我有一个 string
或 &str
的 ASCII 字符,我想将它分成一个 Vec
的所有具有连续相同字符的子字符串(例如,"aabbca"
会变成 ["aa","bb","c","a"]
).
我可以构建一个迭代单个字符并逐渐构建字符串 Vec 的函数,但我觉得我会重新发明轮子。 有没有更惯用的方法来实现这个?
这是我为 &str
实施的直观(和当前)解决方案:
fn split_cont_chars(source:&str) -> Vec<String> {
let mut answer: Vec<String> = Vec::new();
let mut head_char = source.chars().next().unwrap();
let mut counter: usize = 1;
for c in source.chars().skip(1) {
if c == head_char {
counter += 1;
}
else {
answer.push(head_char.to_string().repeat(counter));
head_char = c;
counter = 1;
}
}
answer.push(head_char.to_string().repeat(counter));
answer
}
这按预期工作,但它比解决此类迭代问题的一般 Rust 代码冗长得多。
您可以使用 itertools::Iterator::group_by
而不是 s.chars()
(使用身份函数),但它本质上是低效的,因为当需要将字符收集到字符串中时,您将拥有为每个结果分配一个字符串。我认为将切片放入原始字符串的唯一方法是自己手动查找切片。 (如果 itertools
提供了一个 group_by_indices
函数,那么你可以用它来构造切片,但据我所知它没有。)
fn char_runs(s: &str) -> Vec<&str> {
let mut slices = Vec::new();
let mut it = s.char_indices();
let (mut slice_start, mut prev_char) = match it.next() {
Some(pair) => pair,
None => return slices,
};
for (i, c) in it {
if c != prev_char {
slices.push(&s[slice_start..i]);
slice_start = i;
prev_char = c;
}
}
slices.push(&s[slice_start..]);
slices
}
fn main() {
let strings = ["", "a", "aa", "aab", "aabb", "aabbca", "aabb"];
for s in strings {
println!("{:?} => {:?}", s, char_runs(s));
}
// "" => []
// "a" => ["a"]
// "aa" => ["aa"]
// "aab" => ["aa", "b"]
// "aabb" => ["aa", "bb"]
// "aabbca" => ["aa", "bb", "c", "a"]
// "aabb" => ["aa", "", "bb"]
}
似乎没有比原始解决方案更实用的翻译,但有一个更地道的翻译:
struct LetterSequence {
char_type: char,
len: usize
}
impl LetterSequence {
fn new(a:char, b:usize) -> Self {
LetterSequence{char_type:a, len:b}
}
fn to_string(&self) -> String {
self.char_type.to_string().repeat(self.len)
}
}
fn split_char_struct(source:&str) -> Vec<LetterSequence> {
let mut answer: Vec<LetterSequence> = Vec::new();
let mut seq_count: usize = 1;
let mut head_char: char = source.chars().next().unwrap();
for c in source.chars().skip(1) {
if c == head_char {
seq_count += 1;
}
else {
answer.push(LetterSequence::new(head_char, seq_count));
head_char = c;
seq_count = 1;
}
}
answer.push(LetterSequence::new(head_char, seq_count));
answer}
在 LetterSequence
结构的帮助下,我们避免了必须维护可能与起始 &str
.