如何使用 Rust nom 获取字符串的最后一次出现？

Question

我正在学习 Rust 以及 nom crate。我有一个字符串，它可能是

之一

* abc
* abc, test.txt
* abc, def, test.txt
* abc, test.txt, test.txt

我想编写一个解析器，将结尾文件名和所有其他部分作为元组获取。因此，对于上面的示例输入，预期输出为

* abc                      -> ("abc",           "")
* abc, test.txt            -> ("abc",           "test.txt")
* abc, def, test.txt       -> ("abc, def",      "test.txt")
* abc, test.txt, test.txt  -> ("abc, test.txt", "test.txt")

以下是我现在的代码。


fn test(input: &str) -> (String, String) {
    let result: IResult<&str, _, nom::error::Error<&str>> =
        all_consuming(
            many_till(
                is_a("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._,"),
                preceded(
                    tuple((space0, tag(","), space0)),
                    recognize(is_a("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._")),
                )))(input);

    result.map(|(_, x)| {
        println!("{:?}", x);
        return (x.0.join(""), x.1.to_string());
    }).unwrap_or_default()
}


#[test]
fn test1() {
    assert_eq!(test("test\test2"), ("test\test2".to_string(), "".to_string()));
    assert_eq!(test("test2\a.txt, file1"), ("test2\a.txt".to_string(), "file1".to_string()));
    assert_eq!(test("abc"), ("abc".to_string(), "".to_string()));
    assert_eq!(test("abc, test.txt"), ("abc".to_string(), "test.txt".to_string()));
    assert_eq!(test("bc, def, test.txt"), ("abc, def".to_string(), "test.txt".to_string()));
    assert_eq!(test("abc, test.txt, test.txt"), ("abc, test.txt".to_string(), "test.txt".to_string()));
}

当我运行cargo test时，我得到以下错误：

running 1 test
test test1 ... FAILED

failures:

---- test1 stdout ----
thread 'test1' panicked at 'assertion failed: `(left == right)`
  left: `("", "")`,
 right: `("test\test2", "")`', src\main.rs:138:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    test1

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

问题出在哪里？谢谢。

Answer 1

您真的不需要 nom 这样做。您可以只在最后一个分隔符上拆分字符串：

fn split_std(input: &str) -> (String, String) {
    match input.rfind(", ") {
        None => (input.to_owned(), "".to_owned()),
        Some(idx) => (input[..idx].to_owned(), input[idx + 2..].to_owned()),
    }
}

用 nom 来做要复杂和慢得多。它将需要解析一个 separated_list()，然后加入列表元素并处理不同的极端情况和错误：


fn split_nom(input: &str) -> (String, String) {
    let result: IResult<_, _, nom::error::Error<_>> =
        nom::combinator::all_consuming(nom::multi::separated_list0(
            tag(", "),
            nom::bytes::complete::take_while(|ch| {
                nom::character::is_alphabetic(ch as u8)
                    || nom::character::is_digit(ch as u8)
                    || ch == '_'
                    || ch == '\'
                    || ch == '.'
            }),
        ))(input);

    match result {
        Err(_) => (input.to_owned(), "".to_owned()),
        Ok((_, parts)) => {
            if parts.len() == 1 {
                return (parts[0].to_owned(), "".to_owned());
            }

            (
                parts[..parts.len() - 1].join(", "),
                parts[parts.len() - 1].to_owned(),
            )
        }
    }
}

测试：

#[cfg(test)]
mod tests {
    use crate::{split_nom, split_std};

    #[test]
    fn test_split_std() {
        verify(split_std);
    }

    #[test]
    fn test_split_nom() {
        verify(split_nom);
    }

    fn verify(f: fn(&str) -> (String, String)) {
        assert_eq!(
            f("test\test2"),
            ("test\test2".to_string(), "".to_string())
        );
        assert_eq!(
            f("test2\a.txt, file1"),
            ("test2\a.txt".to_string(), "file1".to_string())
        );
        assert_eq!(f("abc"), ("abc".to_string(), "".to_string()));
        assert_eq!(
            f("abc, test.txt"),
            ("abc".to_string(), "test.txt".to_string())
        );
        assert_eq!(
            f("abc, def, test.txt"),
            ("abc, def".to_string(), "test.txt".to_string())
        );
        assert_eq!(
            f("abc, test.txt, test.txt"),
            ("abc, test.txt".to_string(), "test.txt".to_string())
        );
    }
}

PS：您的测试用例与您的代码不匹配 - 例如在您的测试用例中有 \ 符号，但它不在您的解析器接受的字符列表中。

如何使用 Rust nom 获取字符串的最后一次出现？

How to use Rust nom to get the last occurrence of a string?

rust

nom