使用 Nom 5 解析带有转义引号的单引号字符串

Question

我是 Rust 和 Nom 的新手，我正在尝试解析可能包含转义引号的（单）引号字符串，例如'foo\' bar' 或 'λx → x'、'' 或 ' '。

我找到了 escaped! 宏，它的 documentation 说：

The first argument matches the normal characters (it must not accept the control character), the second argument is the control character (like \ in most languages), the third argument matches the escaped characters

因为我想在匹配器中为“普通字符”匹配反斜杠以外的任何内容，我尝试使用 take_till!:

    named!(till_backslash<&str, &str>, take_till!(|ch| ch == '\'));
    named!(esc<&str, &str>, escaped!(call!(till_backslash), '\', one_of!("'n\")));

    let (input, _) = nom::character::complete::char('\'')(input)?;
    let (input, value) = esc(input)?;
    let (input, _) = nom::character::complete::char('\'')(input)?;

    // … use `value`

但是，当尝试解析 'x' 时，这个 returns Err(Incomplete(Size(1)))。在搜索这个时，人们通常建议使用 CompleteStr，但 Nom 5 中没有。解决这个问题的正确方法是什么？

Answer 1

在所谓的流模式下运行时，nom 可能会 returns Incomplete 表示无法决定并需要更多数据。 nom 4介绍CompleteStr。与 CompleteByteSlice 一起，它们是 &str 和 &[u8] 的完整输入对应物。解析器将它们作为完整模式的输入工作。

它们在 nom 5 中消失了。在 nom 5 中，正如您所观察到的，基于宏的解析器始终以流模式工作。对于在流模式和完整模式下工作不同的解析器组合器，它们在单独的子模块中有不同的版本，例如 nom::bytes::streaming 和 nom::bytes::complete.

对于所有这些血淋淋的细节，您可能需要查看 this blog post，尤其是 Streaming VS complete parsers.

部分

此外，在 nom 5 中，函数组合器优于宏组合器。这是一种方法：

//# nom = "5.0.1"
use nom::{
    branch::alt,
    bytes::complete::{escaped, tag},
    character::complete::none_of,
    sequence::delimited,
    IResult,
};

fn main() {
    let (_, res) = parse_quoted(r#"'foo\'  bar'"#).unwrap();
    assert_eq!(res, r#"foo\'  bar"#);
    let (_, res) = parse_quoted("'λx → x'").unwrap();
    assert_eq!(res, "λx → x");
    let (_, res) = parse_quoted("'  '").unwrap();
    assert_eq!(res, "  ");
    let (_, res) = parse_quoted("''").unwrap();
    assert_eq!(res, "");
}

fn parse_quoted(input: &str) -> IResult<&str, &str> {
    let esc = escaped(none_of("\\'"), '\', tag("'"));
    let esc_or_empty = alt((esc, tag("")));
    let res = delimited(tag("'"), esc_or_empty, tag("'"))(input)?;

    Ok(res)
}

使用 Nom 5 解析带有转义引号的单引号字符串

Parsing single-quoted string with escaped quotes with Nom 5

rust

nom