使 boost::spirit::symbol 解析器非贪婪

Make boost::spirit::symbol parser non greedy

我想制作一个匹配 int 的关键字解析器,但不匹配 integer 中的 int 和剩余的 eger。我使用 x3::symbols 自动获取表示为枚举值的已解析关键字。

最小示例:

#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/utility/error_reporting.hpp>

namespace x3 = boost::spirit::x3;

enum class TypeKeyword { Int, Float, Bool };

struct TypeKeywordSymbolTable : x3::symbols<TypeKeyword> {
    TypeKeywordSymbolTable()
    {
        add("float", TypeKeyword::Float)
           ("int",   TypeKeyword::Int)
           ("bool",  TypeKeyword::Bool);
    }
};
const TypeKeywordSymbolTable type_keyword_symbol_table;

struct TypeKeywordRID {};
using TypeKeywordRule = x3::rule<TypeKeywordRID, TypeKeyword>;

const TypeKeywordRule type_keyword = "type_keyword";
const auto type_keyword_def        = type_keyword_symbol_table;
BOOST_SPIRIT_DEFINE(type_keyword);

using Iterator = std::string_view::const_iterator;

/* Thrown when the parser has failed to parse the whole input stream. Contains
 * the part of the input stream that has not been parsed. */
class LeftoverError : public std::runtime_error {
  public:
    LeftoverError(Iterator begin, Iterator end)
            : std::runtime_error(std::string(begin, end))
    {}

    std::string_view get_leftover_data() const noexcept { return what(); }
};

template<typename Rule>
typename Rule::attribute_type parse(std::string_view input, const Rule& rule)
{
    Iterator begin = input.begin();
    Iterator end   = input.end();

    using ExpectationFailure = boost::spirit::x3::expectation_failure<Iterator>;
    typename Rule::attribute_type result;
    try {
        bool r = x3::phrase_parse(begin, end, rule, x3::space, result);
        if (r && begin == end) {
            return result;
        } else { // Occurs when the whole input stream has not been consumed.
            throw LeftoverError(begin, end);
        }
    } catch (const ExpectationFailure& exc) {
        throw LeftoverError(exc.where(), end);
    }
}

int main()
{
    // TypeKeyword::Bool is parsed and "ean" is leftover, but failed parse with
    // "boolean" leftover is desired.
    parse("boolean", type_keyword);

    // TypeKeyword::Int is parsed and "eger" is leftover, but failed parse with
    // "integer" leftover is desired.
    parse("integer", type_keyword);

    // TypeKeyword::Int is parsed successfully and this is the desired behavior.
    parse("int", type_keyword);
}

基本上,我希望 integer 不被识别为关键字,还有其他 eger 需要解析。

我将测试用例变成了 self-describing 期望:

Live On Compiler Explorer

打印:

FAIL  "boolean"    -> TypeKeyword::Bool    (expected Leftover:"boolean")
FAIL  "integer"    -> TypeKeyword::Int     (expected Leftover:"integer")
OK    "int"        -> TypeKeyword::Int    

现在,最简单、天真的方法是通过简单地更改

来确保解析到 eoi
auto actual = parse(input, Parser::type_keyword);

auto actual = parse(input, Parser::type_keyword >> x3::eoi);

测试确实通过了:Live

OK    "boolean"    -> Leftover:"boolean"  
OK    "integer"    -> Leftover:"integer"  
OK    "int"        -> TypeKeyword::Int    

然而,这符合测试,但不符合目标。让我们想象一个更复杂的语法,其中要解析 type identifier;

auto identifier 
    = x3::rule<struct id_, Ast::Identifier> {"identifier"}
    = x3::lexeme[x3::char_("a-zA-Z_") >> *x3::char_("a-zA-Z_0-9")];

auto type_keyword
    = x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
    = type_;

auto declaration 
    = x3::rule<struct decl_, Ast::Declaration>{"declaration"}
    = type_keyword >> identifier >> ';';

我会留下 Compiler Explorer 的详细信息:

OK    "boolean b;" -> Leftover:"boolean b;"
OK    "integer i;" -> Leftover:"integer i;"
OK    "int j;"     -> (TypeKeyword::Int j)

看起来不错。但是如果我们添加一些有趣的测试呢:

    {"flo at f;", LeftoverError("flo at f;")},
    {"boolean;", LeftoverError("boolean;")},

它打印 (Live)

OK    "boolean b;" -> Leftover:"boolean b;"
OK    "integer i;" -> Leftover:"integer i;"
OK    "int j;"     -> (TypeKeyword::Int j)
FAIL  "boolean;"   -> (TypeKeyword::Bool ean) (expected Leftover:"boolean;")

所以,缺少测试用例。你的散文描述其实更接近:

I'd like to make a keyword parser that matches i.e. int, but does not match int in integer with eger left over

这正确地暗示您要检查 type_keyword 规则中的词素。一个天真的尝试可能是检查没有标识符字符跟在类型关键字之后:

auto type_keyword
    = x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
    = type_ >> !identchar;

其中 identchar 是从 identifier 中分解出来的,如下所示:

auto identchar = x3::char_("a-zA-Z_0-9");

auto identifier 
    = x3::rule<struct id_, Ast::Identifier> {"identifier"}
    = x3::lexeme[x3::char_("a-zA-Z_") >> *identchar];

但是,这不起作用。你能看出原因吗(允许偷看:https://godbolt.org/z/jb4zfhfWb)?

我们最新的曲折测试用例现在通过了(耶),但是 int j; 现在被拒绝了!如果你仔细想想,这才有意义,因为你已经跳过了空格。

我刚才使用的基本词是 lexeme:您想将某些单元视为 lexeme(而空格会停止 lexeme。或者更确切地说,不会自动跳过空格在 lexeme 里面)。因此,解决方法是:

auto type_keyword
    // ...
    = x3::lexeme[type_ >> !identchar];

(Note how I sneakily already included that on the identifier rule earlier)

瞧瞧 (Live):

OK    "boolean b;" -> Leftover:"boolean b;"
OK    "integer i;" -> Leftover:"integer i;"
OK    "int j;"     -> (TypeKeyword::Int j)
OK    "boolean;"   -> Leftover:"boolean;" 

总结

这个话题是一个经常重复出现的话题,它首先需要对船长、词位有扎实的理解。这里有一些其他的帖子可以提供灵感:

  • 我在这里介绍一个您可能会觉得有用的更通用的助手:

     auto kw = [](auto p) {
          return x3::lexeme [ x3::as_parser(p) >> !x3::char_("a-zA-Z0-9_") ];
     };
    

祝你好运!

完整列表

Anti-Bitrot,最终榜单:

#include <boost/fusion/adapted.hpp>
#include <boost/fusion/include/as_vector.hpp>
#include <boost/fusion/include/io.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>

namespace x3 = boost::spirit::x3;

namespace Ast {
    enum class TypeKeyword { Int, Float, Bool };

    static std::ostream& operator<<(std::ostream& os, TypeKeyword tk) {
        switch (tk) {
            case TypeKeyword::Int:   return os << "TypeKeyword::Int";
            case TypeKeyword::Float: return os << "TypeKeyword::Float";
            case TypeKeyword::Bool:  return os << "TypeKeyword::Bool";
        };
        return os << "?";
    }

    using Identifier = std::string;

    struct Declaration {
        TypeKeyword type;
        Identifier id;

        bool operator==(Declaration const&) const = default;
    };

} // namespace Ast

BOOST_FUSION_ADAPT_STRUCT(Ast::Declaration, type, id)

namespace Ast{
    static std::ostream& operator<<(std::ostream& os, Ast::Declaration const& d) {
        return os << boost::lexical_cast<std::string>(boost::fusion::as_vector(d));
    }
} // namespace Ast

namespace Parser {
    struct Type : x3::symbols<Ast::TypeKeyword> {
        Type() {
            add("float", Ast::TypeKeyword::Float) //
                ("int", Ast::TypeKeyword::Int)    //
                ("bool", Ast::TypeKeyword::Bool); //
        }
    } const static type_;

    auto identchar = x3::char_("a-zA-Z_0-9");

    auto identifier 
        = x3::rule<struct id_, Ast::Identifier> {"identifier"}
        = x3::lexeme[x3::char_("a-zA-Z_") >> *identchar];

    auto type_keyword
        = x3::rule<struct tk_, Ast::TypeKeyword> {"type_keyword"}
        = x3::lexeme[type_ >> !identchar];

    auto declaration 
        = x3::rule<struct decl_, Ast::Declaration>{"declaration"}
        = type_keyword >> identifier >> ';';
} // namespace Parser

struct LeftoverError : std::runtime_error {
    using std::runtime_error::runtime_error;

    friend std::ostream& operator<<(std::ostream& os, LeftoverError const& e) {
        return os << (std::string("Leftover:\"") + e.what() + "\"");
    }
    bool operator==(LeftoverError const& other) const {
        return std::string_view(what()) == other.what();
    }
};

template<typename T>
using Maybe = boost::variant<T, LeftoverError>;

template <typename Rule,
          typename Attr = typename x3::traits::attribute_of<Rule, x3::unused_type>::type,
          typename R = Maybe<Attr>>
R parse(std::string_view input, Rule const& rule) {
    Attr result;
    auto f = input.begin(), l = input.end();
    return x3::phrase_parse(f, l, rule, x3::space, result)
        ? R(std::move(result))
        : LeftoverError({f, l});
}

int main()
{
    using namespace Ast;

    struct {
        std::string_view        input;
        Maybe<Declaration>      expected;
    } cases[] = {
        {"boolean b;", LeftoverError("boolean b;")},
        {"integer i;", LeftoverError("integer i;")},
        {"int j;", Declaration{TypeKeyword::Int, "j"}},
        {"boolean;", LeftoverError("boolean;")},
    };
    for (auto [input, expected] : cases) {
        auto actual = parse(input, Parser::declaration >> x3::eoi);
        bool ok     = expected == actual;

        std::cout << std::left << std::setw(6) << (ok ? "OK" : "FAIL")
                  << std::setw(12) << std::quoted(input) << " -> "
                  << std::setw(20) << actual;
        if (not ok)
            std::cout << " (expected " << expected << ")";
        std::cout << "\n";
    }
}

¹ 参见 Boost spirit skipper issues