如何使用 Rust nom 获取字符串的最后一次出现?
How to use Rust nom to get the last occurrence of a string?
我正在学习 Rust 以及 nom crate。我有一个字符串,它可能是
之一
* abc
* abc, test.txt
* abc, def, test.txt
* abc, test.txt, test.txt
我想编写一个解析器,将结尾文件名和所有其他部分作为元组获取。因此,对于上面的示例输入,预期输出为
* abc -> ("abc", "")
* abc, test.txt -> ("abc", "test.txt")
* abc, def, test.txt -> ("abc, def", "test.txt")
* abc, test.txt, test.txt -> ("abc, test.txt", "test.txt")
以下是我现在的代码。
fn test(input: &str) -> (String, String) {
let result: IResult<&str, _, nom::error::Error<&str>> =
all_consuming(
many_till(
is_a("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._,"),
preceded(
tuple((space0, tag(","), space0)),
recognize(is_a("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._")),
)))(input);
result.map(|(_, x)| {
println!("{:?}", x);
return (x.0.join(""), x.1.to_string());
}).unwrap_or_default()
}
#[test]
fn test1() {
assert_eq!(test("test\test2"), ("test\test2".to_string(), "".to_string()));
assert_eq!(test("test2\a.txt, file1"), ("test2\a.txt".to_string(), "file1".to_string()));
assert_eq!(test("abc"), ("abc".to_string(), "".to_string()));
assert_eq!(test("abc, test.txt"), ("abc".to_string(), "test.txt".to_string()));
assert_eq!(test("bc, def, test.txt"), ("abc, def".to_string(), "test.txt".to_string()));
assert_eq!(test("abc, test.txt, test.txt"), ("abc, test.txt".to_string(), "test.txt".to_string()));
}
当我运行cargo test
时,我得到以下错误:
running 1 test
test test1 ... FAILED
failures:
---- test1 stdout ----
thread 'test1' panicked at 'assertion failed: `(left == right)`
left: `("", "")`,
right: `("test\test2", "")`', src\main.rs:138:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
test1
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
问题出在哪里?谢谢。
您真的不需要 nom
这样做。您可以只在最后一个分隔符上拆分字符串:
fn split_std(input: &str) -> (String, String) {
match input.rfind(", ") {
None => (input.to_owned(), "".to_owned()),
Some(idx) => (input[..idx].to_owned(), input[idx + 2..].to_owned()),
}
}
用 nom 来做要复杂和慢得多。它将需要解析一个 separated_list()
,然后
加入列表元素并处理不同的极端情况和错误:
fn split_nom(input: &str) -> (String, String) {
let result: IResult<_, _, nom::error::Error<_>> =
nom::combinator::all_consuming(nom::multi::separated_list0(
tag(", "),
nom::bytes::complete::take_while(|ch| {
nom::character::is_alphabetic(ch as u8)
|| nom::character::is_digit(ch as u8)
|| ch == '_'
|| ch == '\'
|| ch == '.'
}),
))(input);
match result {
Err(_) => (input.to_owned(), "".to_owned()),
Ok((_, parts)) => {
if parts.len() == 1 {
return (parts[0].to_owned(), "".to_owned());
}
(
parts[..parts.len() - 1].join(", "),
parts[parts.len() - 1].to_owned(),
)
}
}
}
测试:
#[cfg(test)]
mod tests {
use crate::{split_nom, split_std};
#[test]
fn test_split_std() {
verify(split_std);
}
#[test]
fn test_split_nom() {
verify(split_nom);
}
fn verify(f: fn(&str) -> (String, String)) {
assert_eq!(
f("test\test2"),
("test\test2".to_string(), "".to_string())
);
assert_eq!(
f("test2\a.txt, file1"),
("test2\a.txt".to_string(), "file1".to_string())
);
assert_eq!(f("abc"), ("abc".to_string(), "".to_string()));
assert_eq!(
f("abc, test.txt"),
("abc".to_string(), "test.txt".to_string())
);
assert_eq!(
f("abc, def, test.txt"),
("abc, def".to_string(), "test.txt".to_string())
);
assert_eq!(
f("abc, test.txt, test.txt"),
("abc, test.txt".to_string(), "test.txt".to_string())
);
}
}
PS:您的测试用例与您的代码不匹配 - 例如在您的测试用例中
有 \
符号,但它不在您的解析器接受的字符列表中。
我正在学习 Rust 以及 nom crate。我有一个字符串,它可能是
之一* abc
* abc, test.txt
* abc, def, test.txt
* abc, test.txt, test.txt
我想编写一个解析器,将结尾文件名和所有其他部分作为元组获取。因此,对于上面的示例输入,预期输出为
* abc -> ("abc", "")
* abc, test.txt -> ("abc", "test.txt")
* abc, def, test.txt -> ("abc, def", "test.txt")
* abc, test.txt, test.txt -> ("abc, test.txt", "test.txt")
以下是我现在的代码。
fn test(input: &str) -> (String, String) {
let result: IResult<&str, _, nom::error::Error<&str>> =
all_consuming(
many_till(
is_a("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._,"),
preceded(
tuple((space0, tag(","), space0)),
recognize(is_a("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._")),
)))(input);
result.map(|(_, x)| {
println!("{:?}", x);
return (x.0.join(""), x.1.to_string());
}).unwrap_or_default()
}
#[test]
fn test1() {
assert_eq!(test("test\test2"), ("test\test2".to_string(), "".to_string()));
assert_eq!(test("test2\a.txt, file1"), ("test2\a.txt".to_string(), "file1".to_string()));
assert_eq!(test("abc"), ("abc".to_string(), "".to_string()));
assert_eq!(test("abc, test.txt"), ("abc".to_string(), "test.txt".to_string()));
assert_eq!(test("bc, def, test.txt"), ("abc, def".to_string(), "test.txt".to_string()));
assert_eq!(test("abc, test.txt, test.txt"), ("abc, test.txt".to_string(), "test.txt".to_string()));
}
当我运行cargo test
时,我得到以下错误:
running 1 test
test test1 ... FAILED
failures:
---- test1 stdout ----
thread 'test1' panicked at 'assertion failed: `(left == right)`
left: `("", "")`,
right: `("test\test2", "")`', src\main.rs:138:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
test1
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
问题出在哪里?谢谢。
您真的不需要 nom
这样做。您可以只在最后一个分隔符上拆分字符串:
fn split_std(input: &str) -> (String, String) {
match input.rfind(", ") {
None => (input.to_owned(), "".to_owned()),
Some(idx) => (input[..idx].to_owned(), input[idx + 2..].to_owned()),
}
}
用 nom 来做要复杂和慢得多。它将需要解析一个 separated_list()
,然后
加入列表元素并处理不同的极端情况和错误:
fn split_nom(input: &str) -> (String, String) {
let result: IResult<_, _, nom::error::Error<_>> =
nom::combinator::all_consuming(nom::multi::separated_list0(
tag(", "),
nom::bytes::complete::take_while(|ch| {
nom::character::is_alphabetic(ch as u8)
|| nom::character::is_digit(ch as u8)
|| ch == '_'
|| ch == '\'
|| ch == '.'
}),
))(input);
match result {
Err(_) => (input.to_owned(), "".to_owned()),
Ok((_, parts)) => {
if parts.len() == 1 {
return (parts[0].to_owned(), "".to_owned());
}
(
parts[..parts.len() - 1].join(", "),
parts[parts.len() - 1].to_owned(),
)
}
}
}
测试:
#[cfg(test)]
mod tests {
use crate::{split_nom, split_std};
#[test]
fn test_split_std() {
verify(split_std);
}
#[test]
fn test_split_nom() {
verify(split_nom);
}
fn verify(f: fn(&str) -> (String, String)) {
assert_eq!(
f("test\test2"),
("test\test2".to_string(), "".to_string())
);
assert_eq!(
f("test2\a.txt, file1"),
("test2\a.txt".to_string(), "file1".to_string())
);
assert_eq!(f("abc"), ("abc".to_string(), "".to_string()));
assert_eq!(
f("abc, test.txt"),
("abc".to_string(), "test.txt".to_string())
);
assert_eq!(
f("abc, def, test.txt"),
("abc, def".to_string(), "test.txt".to_string())
);
assert_eq!(
f("abc, test.txt, test.txt"),
("abc, test.txt".to_string(), "test.txt".to_string())
);
}
}
PS:您的测试用例与您的代码不匹配 - 例如在您的测试用例中
有 \
符号,但它不在您的解析器接受的字符列表中。