如何在 Rust 中使用 nom 解析对称引用字符串?
How to parse a symmetric quoted string using nom in rust?
我应该如何使用 nom 解析类似于 rust 原始字符串的带引号的字符串?
我想解析以下内容:
"A standard string"
#"A string containing ["] a quote"#
##"A string containing ["#] a quote and hash "##
我该怎么做,要求在开始和结束处使用相同数量的“#”符号,同时允许使用#'ed 的字符串包含未转义的引号和散列?
这将是我的方法(使用 nom-5.1.1
):
extern crate nom;
use nom::{
IResult,
multi::{count, fold_many0, many_till},
bytes::complete::{tag, take},
sequence::pair
};
fn quoted_str(input: &str) -> IResult<&str, &str> {
// Count number of leading #
let (remaining, hash_count) = fold_many0(tag("#"), 0, |acc, _| acc + 1)(input)?;
// Match "
let (remaining, _) = tag("\"")(remaining)?;
// Take until closing " plus # (repeated hash_count times)
let closing = pair(tag("\""), count(tag("#"), hash_count));
let (remaining, (inner, _)) = many_till(take(1u32), closing)(remaining)?;
// Extract inner range
let offset = hash_count + 1;
let length = inner.len();
Ok((remaining, &input[offset .. offset + length]))
}
#[test]
fn run_test() {
assert_eq!(quoted_str("\"ABC\""), Ok(("", "ABC")));
assert_eq!(quoted_str("#\"ABC\"#"), Ok(("", "ABC")));
assert_eq!(quoted_str("##\"ABC\"##"), Ok(("", "ABC")));
assert_eq!(quoted_str("###\"ABC\"###"), Ok(("", "ABC")));
assert_eq!(quoted_str("#\"ABC\"XYZ\"#"), Ok(("", "ABC\"XYZ")));
assert_eq!(quoted_str("#\"ABC\"#XYZ\"#"), Ok(("XYZ\"#", "ABC")));
assert_eq!(quoted_str("#\"ABC\"##XYZ\"#"), Ok(("#XYZ\"#", "ABC")));
assert_eq!(quoted_str("##\"ABC\"XYZ\"##"), Ok(("", "ABC\"XYZ")));
assert_eq!(quoted_str("##\"ABC\"#XYZ\"##"), Ok(("", "ABC\"#XYZ")));
assert_eq!(quoted_str("##\"ABC\"##XYZ\"##"), Ok(("XYZ\"##", "ABC")));
assert_eq!(quoted_str("##\"ABC\"###XYZ\"##"), Ok(("#XYZ\"##", "ABC")));
assert_eq!(quoted_str("\"ABC\"XYZ"), Ok(("XYZ", "ABC")));
assert_eq!(quoted_str("#\"ABC\"#XYZ"), Ok(("XYZ", "ABC")));
assert_eq!(quoted_str("##\"ABC\"##XYZ"), Ok(("XYZ", "ABC")));
}
如果性能对您很重要,可以通过基于 fold_many0
和 many_fill
的代码编写 fold_many_till
函数来避免 many_till
中的隐式向量分配.好像nom
目前没有提供这样的功能
我应该如何使用 nom 解析类似于 rust 原始字符串的带引号的字符串? 我想解析以下内容:
"A standard string"
#"A string containing ["] a quote"#
##"A string containing ["#] a quote and hash "##
我该怎么做,要求在开始和结束处使用相同数量的“#”符号,同时允许使用#'ed 的字符串包含未转义的引号和散列?
这将是我的方法(使用 nom-5.1.1
):
extern crate nom;
use nom::{
IResult,
multi::{count, fold_many0, many_till},
bytes::complete::{tag, take},
sequence::pair
};
fn quoted_str(input: &str) -> IResult<&str, &str> {
// Count number of leading #
let (remaining, hash_count) = fold_many0(tag("#"), 0, |acc, _| acc + 1)(input)?;
// Match "
let (remaining, _) = tag("\"")(remaining)?;
// Take until closing " plus # (repeated hash_count times)
let closing = pair(tag("\""), count(tag("#"), hash_count));
let (remaining, (inner, _)) = many_till(take(1u32), closing)(remaining)?;
// Extract inner range
let offset = hash_count + 1;
let length = inner.len();
Ok((remaining, &input[offset .. offset + length]))
}
#[test]
fn run_test() {
assert_eq!(quoted_str("\"ABC\""), Ok(("", "ABC")));
assert_eq!(quoted_str("#\"ABC\"#"), Ok(("", "ABC")));
assert_eq!(quoted_str("##\"ABC\"##"), Ok(("", "ABC")));
assert_eq!(quoted_str("###\"ABC\"###"), Ok(("", "ABC")));
assert_eq!(quoted_str("#\"ABC\"XYZ\"#"), Ok(("", "ABC\"XYZ")));
assert_eq!(quoted_str("#\"ABC\"#XYZ\"#"), Ok(("XYZ\"#", "ABC")));
assert_eq!(quoted_str("#\"ABC\"##XYZ\"#"), Ok(("#XYZ\"#", "ABC")));
assert_eq!(quoted_str("##\"ABC\"XYZ\"##"), Ok(("", "ABC\"XYZ")));
assert_eq!(quoted_str("##\"ABC\"#XYZ\"##"), Ok(("", "ABC\"#XYZ")));
assert_eq!(quoted_str("##\"ABC\"##XYZ\"##"), Ok(("XYZ\"##", "ABC")));
assert_eq!(quoted_str("##\"ABC\"###XYZ\"##"), Ok(("#XYZ\"##", "ABC")));
assert_eq!(quoted_str("\"ABC\"XYZ"), Ok(("XYZ", "ABC")));
assert_eq!(quoted_str("#\"ABC\"#XYZ"), Ok(("XYZ", "ABC")));
assert_eq!(quoted_str("##\"ABC\"##XYZ"), Ok(("XYZ", "ABC")));
}
如果性能对您很重要,可以通过基于 fold_many0
和 many_fill
的代码编写 fold_many_till
函数来避免 many_till
中的隐式向量分配.好像nom
目前没有提供这样的功能