如何使用 Rust 中的 nom 库匹配精确的标签
How can I match an exact tag using the nom library in Rust
我正在开发一个用 Rust 编写的小型持续时间解析库,并使用 nom 库。在这个库中,我定义了一个 second
解析器组合器函数。它的职责是解析各种可接受的格式,以文本格式表示秒数。
pub fn duration(input: &str) -> IResult<&str, std::time::Duration> {
// Some code combining the various time format combinators
// to match the format "10 days, 8 hours, 7 minutes and 6 seconds"
}
pub fn seconds(input: &str) -> IResult<&str, u64> {
terminated(unsigned_integer_64, preceded(multispace0, second))(input)
}
fn second(input: &str) -> IResult<&str, &str> {
alt((
tag("seconds"),
tag("second"),
tag("secs"),
tag("sec"),
tag("s"),
))(input)
}
到目前为止,标签组合器的行为符合我的预期。然而,我最近发现下面的例子失败了,并且根据定义失败了:
assert!(second("se").is_err())
事实上,文档指出“输入数据将与标签组合器的参数进行比较,并将 return 匹配参数的输入部分”。
但是,正如我的示例所希望说明的那样,我想要实现的是某种类型的标记,如果无法解析整个输入,该标记就会失败。我研究了在解析输入后是否有休息的明确检查;并发现它会起作用。此外,未成功探索使用 complete
和 take
组合器的某些风格来实现这一点。
解析单词的“完全匹配”并在部分结果上失败(那将 return 休息)的惯用方法是什么?
您可以使用 all consuming 组合器,如果整个输入已被其子解析器使用,则组合器会成功:
// nom 6.1.2
use nom::branch::alt;
use nom::bytes::complete::tag;
use nom::combinator::all_consuming;
use nom::IResult;
fn main() {
assert!(second("se").is_err());
}
fn second(input: &str) -> IResult<&str, &str> {
all_consuming(alt((
tag("seconds"),
tag("second"),
tag("secs"),
tag("sec"),
tag("s"),
)))(input)
}
更新
我想我误解了你原来的问题。也许这更接近您的需要。关键是你应该编写更小的解析器,然后将它们组合起来:
use nom::branch::alt;
use nom::bytes::complete::tag;
use nom::character::complete::digit1;
use nom::combinator::all_consuming;
use nom::sequence::{terminated, tuple};
use nom::IResult;
#[derive(Debug)]
struct Time {
min: u32,
sec: u32,
}
fn main() {
//OK
let parsed = time("10 minutes, 5 seconds");
println!("{:?}", parsed);
//OK
let parsed = time("10 mins, 5 s");
println!("{:?}", parsed);
//Error -> although `min` is a valid tag, it would expect `, ` afterwards, instead of `ts`
let parsed = time("10 mints, 5 s");
println!("{:?}", parsed);
//Error -> there must not be anything left after "5 s"
let parsed = time("10 mins, 5 s, ");
println!("{:?}", parsed);
// Error -> although it starts with `sec` which is a valid tag, it will fail, because it would expect EOF
let parsed = time("10 min, 5 sections");
println!("{:?}", parsed);
}
fn time(input: &str) -> IResult<&str, Time> {
// parse the minutes section and **expect** a delimiter, because there **must** be another section afterwards
let (rem, min) = terminated(minutes_section, delimiter)(input)?;
// parse the minutes section and **expect** EOF - i.e. there should not be any input left to parse
let (rem, sec) = all_consuming(seconds_section)(rem)?;
// rem should be empty slice
IResult::Ok((rem, Time { min, sec }))
}
// This function combines several parsers to parse the minutes section:
// NUMBER[sep]TAG-MINUTES
fn minutes_section(input: &str) -> IResult<&str, u32> {
let (rem, (min, _sep, _tag)) = tuple((number, separator, minutes))(input)?;
IResult::Ok((rem, min))
}
// This function combines several parsers to parse the seconds section:
// NUMBER[sep]TAG-SECONDS
fn seconds_section(input: &str) -> IResult<&str, u32> {
let (rem, (sec, _sep, _tag)) = tuple((number, separator, seconds))(input)?;
IResult::Ok((rem, sec))
}
fn number(input: &str) -> IResult<&str, u32> {
digit1(input).map(|(remaining, number)| {
// it can panic if the string represents a number
// that does not fit into u32
let n = number.parse().unwrap();
(remaining, n)
})
}
fn minutes(input: &str) -> IResult<&str, &str> {
alt((
tag("minutes"),
tag("minute"),
tag("mins"),
tag("min"),
tag("m"),
))(input)
}
fn seconds(input: &str) -> IResult<&str, &str> {
alt((
tag("seconds"),
tag("second"),
tag("secs"),
tag("sec"),
tag("s"),
))(input)
}
// This function parses the separator between the number and the tag:
//N<separator>tag -> 5[sep]minutes
fn separator(input: &str) -> IResult<&str, &str> {
tag(" ")(input)
}
// This function parses the delimiter between the sections:
// X minutes<delimiter>Y seconds -> 1 min[delimiter]2 sec
fn delimiter(input: &str) -> IResult<&str, &str> {
tag(", ")(input)
}
我在这里为构建块创建了一组基本解析器,例如“数字”、“分隔符”、“定界符”、各种标记(分钟、秒等)。 None 的那些期望是“完整的单词”。相反,您应该使用组合符,例如 terminated
、tuple
、all_consuming
来标记“确切单词”结束的位置。
我正在开发一个用 Rust 编写的小型持续时间解析库,并使用 nom 库。在这个库中,我定义了一个 second
解析器组合器函数。它的职责是解析各种可接受的格式,以文本格式表示秒数。
pub fn duration(input: &str) -> IResult<&str, std::time::Duration> {
// Some code combining the various time format combinators
// to match the format "10 days, 8 hours, 7 minutes and 6 seconds"
}
pub fn seconds(input: &str) -> IResult<&str, u64> {
terminated(unsigned_integer_64, preceded(multispace0, second))(input)
}
fn second(input: &str) -> IResult<&str, &str> {
alt((
tag("seconds"),
tag("second"),
tag("secs"),
tag("sec"),
tag("s"),
))(input)
}
到目前为止,标签组合器的行为符合我的预期。然而,我最近发现下面的例子失败了,并且根据定义失败了:
assert!(second("se").is_err())
事实上,文档指出“输入数据将与标签组合器的参数进行比较,并将 return 匹配参数的输入部分”。
但是,正如我的示例所希望说明的那样,我想要实现的是某种类型的标记,如果无法解析整个输入,该标记就会失败。我研究了在解析输入后是否有休息的明确检查;并发现它会起作用。此外,未成功探索使用 complete
和 take
组合器的某些风格来实现这一点。
解析单词的“完全匹配”并在部分结果上失败(那将 return 休息)的惯用方法是什么?
您可以使用 all consuming 组合器,如果整个输入已被其子解析器使用,则组合器会成功:
// nom 6.1.2
use nom::branch::alt;
use nom::bytes::complete::tag;
use nom::combinator::all_consuming;
use nom::IResult;
fn main() {
assert!(second("se").is_err());
}
fn second(input: &str) -> IResult<&str, &str> {
all_consuming(alt((
tag("seconds"),
tag("second"),
tag("secs"),
tag("sec"),
tag("s"),
)))(input)
}
更新
我想我误解了你原来的问题。也许这更接近您的需要。关键是你应该编写更小的解析器,然后将它们组合起来:
use nom::branch::alt;
use nom::bytes::complete::tag;
use nom::character::complete::digit1;
use nom::combinator::all_consuming;
use nom::sequence::{terminated, tuple};
use nom::IResult;
#[derive(Debug)]
struct Time {
min: u32,
sec: u32,
}
fn main() {
//OK
let parsed = time("10 minutes, 5 seconds");
println!("{:?}", parsed);
//OK
let parsed = time("10 mins, 5 s");
println!("{:?}", parsed);
//Error -> although `min` is a valid tag, it would expect `, ` afterwards, instead of `ts`
let parsed = time("10 mints, 5 s");
println!("{:?}", parsed);
//Error -> there must not be anything left after "5 s"
let parsed = time("10 mins, 5 s, ");
println!("{:?}", parsed);
// Error -> although it starts with `sec` which is a valid tag, it will fail, because it would expect EOF
let parsed = time("10 min, 5 sections");
println!("{:?}", parsed);
}
fn time(input: &str) -> IResult<&str, Time> {
// parse the minutes section and **expect** a delimiter, because there **must** be another section afterwards
let (rem, min) = terminated(minutes_section, delimiter)(input)?;
// parse the minutes section and **expect** EOF - i.e. there should not be any input left to parse
let (rem, sec) = all_consuming(seconds_section)(rem)?;
// rem should be empty slice
IResult::Ok((rem, Time { min, sec }))
}
// This function combines several parsers to parse the minutes section:
// NUMBER[sep]TAG-MINUTES
fn minutes_section(input: &str) -> IResult<&str, u32> {
let (rem, (min, _sep, _tag)) = tuple((number, separator, minutes))(input)?;
IResult::Ok((rem, min))
}
// This function combines several parsers to parse the seconds section:
// NUMBER[sep]TAG-SECONDS
fn seconds_section(input: &str) -> IResult<&str, u32> {
let (rem, (sec, _sep, _tag)) = tuple((number, separator, seconds))(input)?;
IResult::Ok((rem, sec))
}
fn number(input: &str) -> IResult<&str, u32> {
digit1(input).map(|(remaining, number)| {
// it can panic if the string represents a number
// that does not fit into u32
let n = number.parse().unwrap();
(remaining, n)
})
}
fn minutes(input: &str) -> IResult<&str, &str> {
alt((
tag("minutes"),
tag("minute"),
tag("mins"),
tag("min"),
tag("m"),
))(input)
}
fn seconds(input: &str) -> IResult<&str, &str> {
alt((
tag("seconds"),
tag("second"),
tag("secs"),
tag("sec"),
tag("s"),
))(input)
}
// This function parses the separator between the number and the tag:
//N<separator>tag -> 5[sep]minutes
fn separator(input: &str) -> IResult<&str, &str> {
tag(" ")(input)
}
// This function parses the delimiter between the sections:
// X minutes<delimiter>Y seconds -> 1 min[delimiter]2 sec
fn delimiter(input: &str) -> IResult<&str, &str> {
tag(", ")(input)
}
我在这里为构建块创建了一组基本解析器,例如“数字”、“分隔符”、“定界符”、各种标记(分钟、秒等)。 None 的那些期望是“完整的单词”。相反,您应该使用组合符,例如 terminated
、tuple
、all_consuming
来标记“确切单词”结束的位置。