解析带有转义单引号的字符串
Parse string with escaped single quotes
我想解析一个字符串,该字符串包含单引号之间的 ASCII 字符,并且可以包含连续两个 ' 的转义单引号。
'string value contained between single quotes -> '' and so on...'
结果应该是:
string value contained between single quotes -> ' and so on...
use nom::{
bytes::complete::{tag, take_while},
error::{ErrorKind, ParseError},
sequence::delimited,
IResult,
};
fn main() {
let res = string_value::<(&str, ErrorKind)>("'abc''def'");
assert_eq!(res, Ok(("", "abc\'def")));
}
pub fn is_ascii_char(chr: char) -> bool {
chr.is_ascii()
}
fn string_value<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, &'a str, E> {
delimited(tag("'"), take_while(is_ascii_char), tag("'"))(i)
}
如何检测转义引号而不是字符串结尾?
这很棘手,但以下是可行的:
//# nom = "5.0.1"
use nom::{
bytes::complete::{escaped_transform, tag},
character::complete::none_of,
combinator::{recognize, map_parser},
multi::{many0, separated_list},
sequence::delimited,
IResult,
};
fn main() {
let (_, res) = parse_quoted("'abc''def'").unwrap();
assert_eq!(res, "abc'def");
let (_, res) = parse_quoted("'xy@$%!z'").unwrap();
assert_eq!(res, "xy@$%!z");
let (_, res) = parse_quoted("'single quotes -> '' and so on...'").unwrap();
assert_eq!(res, "single quotes -> ' and so on...");
}
fn parse_quoted(input: &str) -> IResult<&str, String> {
let seq = recognize(separated_list(tag("''"), many0(none_of("'"))));
let unquote = escaped_transform(none_of("'"), '\'', tag("'"));
let res = delimited(tag("'"), map_parser(seq, unquote), tag("'"))(input)?;
Ok(res)
}
一些解释:
- 解析器
seq
可以识别在双引号和其他任何内容之间交替出现的任何序列;
unquote
将任何双引号转换为单引号;
map_parser
然后将两者组合在一起以产生所需的结果。
请注意,由于使用了escaped_transform
组合子,解析结果是String
而不是&str
。即,有额外的分配。
我正在学习 nom,以下是我的尝试。
let a = r###"'string value contained between single quotes -> '' and so on...'"###;
fn parser(input: &str) -> IResult<&str, &str> {
let len = input.chars().count() - 2;
delimited(tag("'"), take(len), tag("'"))(input)
}
let (remaining, mut matched) = parser(a).unwrap_or_default();
let sss = matched.replace("''", "'");
matched = &sss;
println!("remaining: {:#?}", remaining);
println!("matched: {:#?}", matched);
它打印出这个结果:
remaining: ""
matched: "string value contained between single quotes -> ' and so on..."
我的测试基于 nom 6.2.1。
我想解析一个字符串,该字符串包含单引号之间的 ASCII 字符,并且可以包含连续两个 ' 的转义单引号。
'string value contained between single quotes -> '' and so on...'
结果应该是:
string value contained between single quotes -> ' and so on...
use nom::{
bytes::complete::{tag, take_while},
error::{ErrorKind, ParseError},
sequence::delimited,
IResult,
};
fn main() {
let res = string_value::<(&str, ErrorKind)>("'abc''def'");
assert_eq!(res, Ok(("", "abc\'def")));
}
pub fn is_ascii_char(chr: char) -> bool {
chr.is_ascii()
}
fn string_value<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, &'a str, E> {
delimited(tag("'"), take_while(is_ascii_char), tag("'"))(i)
}
如何检测转义引号而不是字符串结尾?
这很棘手,但以下是可行的:
//# nom = "5.0.1"
use nom::{
bytes::complete::{escaped_transform, tag},
character::complete::none_of,
combinator::{recognize, map_parser},
multi::{many0, separated_list},
sequence::delimited,
IResult,
};
fn main() {
let (_, res) = parse_quoted("'abc''def'").unwrap();
assert_eq!(res, "abc'def");
let (_, res) = parse_quoted("'xy@$%!z'").unwrap();
assert_eq!(res, "xy@$%!z");
let (_, res) = parse_quoted("'single quotes -> '' and so on...'").unwrap();
assert_eq!(res, "single quotes -> ' and so on...");
}
fn parse_quoted(input: &str) -> IResult<&str, String> {
let seq = recognize(separated_list(tag("''"), many0(none_of("'"))));
let unquote = escaped_transform(none_of("'"), '\'', tag("'"));
let res = delimited(tag("'"), map_parser(seq, unquote), tag("'"))(input)?;
Ok(res)
}
一些解释:
- 解析器
seq
可以识别在双引号和其他任何内容之间交替出现的任何序列; unquote
将任何双引号转换为单引号;map_parser
然后将两者组合在一起以产生所需的结果。
请注意,由于使用了escaped_transform
组合子,解析结果是String
而不是&str
。即,有额外的分配。
我正在学习 nom,以下是我的尝试。
let a = r###"'string value contained between single quotes -> '' and so on...'"###;
fn parser(input: &str) -> IResult<&str, &str> {
let len = input.chars().count() - 2;
delimited(tag("'"), take(len), tag("'"))(input)
}
let (remaining, mut matched) = parser(a).unwrap_or_default();
let sss = matched.replace("''", "'");
matched = &sss;
println!("remaining: {:#?}", remaining);
println!("matched: {:#?}", matched);
它打印出这个结果:
remaining: ""
matched: "string value contained between single quotes -> ' and so on..."
我的测试基于 nom 6.2.1。