如何解决不匹配的替代解析器及其属性

How can I resolve mismatched alternative parsers and their attributes

我正在尝试解析成某种形式

enum class shape { ellipse, circle };
enum class other_shape { square, rectangle };
enum class position { top, left, right, bottom, center, bottom };
struct result
{
    std::variant<shape, std::string> bla;
    position pos;
    std::vector<double> bloe;
};

我知道这没有多大意义(为什么不合并 shapeother_shape,对吧?),但我试图将结果类型简化为类似于真正的结果。 但是输入的形式有些灵活,因此我似乎需要额外的替代方案,这些替代方案不能正确映射到上述结构定义,从而导致 "unexpected attribute size" 静态断言。

真正的问题是输入中 bla+posbloe 部分之间的逗号,因为两者都可能被省略。输入示例

circle at center, 1, 2, 3
at top, 1, 2, 3
at bottom
circle, 1, 2 3
1, 2, 3
my_fancy_shape at right, 1

每次省略某些部分时,它都会获得一个默认值(假设枚举的第一个值和变量中的类型。

我的语法有点像这样

( circle
| ellipse
| square
| rectangle
| x3::attr(shape::circle)
) >> ( "at" >> position
     | x3::attr(css::center)
     ) >> -x3::lit(',')
  >> x3::double_ % ','

如您所见,第一个替代集直接映射到 variant(如果完全省略,则包含默认值),第二个替代集提供默认值,如果 at部分丢失。接下来是逗号分隔值的向量。

我这里的问题是上面的语法将匹配这两个无效输入:

, 1, 2, 3
circle 1, 2, 3

所以结果虽然有点优雅,但很草率。

如何在不改变结果形式的情况下编写仅在第一部分不为空时才具有所需逗号的语法?

我可以想到通过将两个备选集连接成一组所有混合可能性来实现此目的的语法,在实际应该出现的地方使用逗号,但是 Spirit.X3 无法将这个备选解析器映射到两个成员(变体和值)。例如。一个非常低效的基线 "all the posibilities listed":

( circle >> x3::attr(position::center) >> ','
| ellipse >> x3::attr(position::center) >> ','
| square >> x3::attr(position::center) >> ','
| rectangle >> x3::attr(position::center) >> ','
| circle >> "at" >> position >> ','
| ellipse >> "at" >> position >> ','
| square >> "at" >> position >> ','
| rectangle >> "at" >> position >> ','
| x3::attr(shape::circle) >> "at" >> position >> ','
| x3::attr(shape::circle) >> x3::attr(position::center)
) >> x3::double_ % ','

最后一个选项省略了逗号,但除了过多之外,X3 拒绝将其映射到结果结构。

我会更简单地建模语法,自上而下并匹配 AST。

简化 AST 类型:

namespace AST {
    enum class shape       { ellipse, circle                  } ;
    enum class other_shape { square, rectangle                } ;
    enum class position    { top, left, right, bottom, center } ;

    using any_shape = std::variant<shape, other_shape, std::string>;
    using data = std::vector<double>;

    struct result {
        any_shape bla;
        position  pos;
        data      bloe;
    };
}

BOOST_FUSION_ADAPT_STRUCT(AST::result, bla, pos, bloe)

我会将解析器编写为:

auto const data = as<AST::data>(double_ % ',');
auto const position = kw("at") >> position_sym;

auto const custom_shape =
        !(position|data) >> kw(as<std::string>(+identchar));
auto const any_shape = as<AST::any_shape>(
        ikw(shape_sym) | ikw(other_shape_sym) | custom_shape);

auto const shape_line = as<AST::result>(
        -any_shape >> -position >> (','|&EOL) >> -data);
auto const shapes     = skip(blank) [ shape_line % eol ];

这是使用一些辅助 shorthand 函数,正如您所知我经常做的:

////////////////
// helpers - attribute coercion
template <typename T>
auto as  = [](auto p) {
    return rule<struct _, T> {typeid(T).name()} = p;
};

// keyword boundary detection
auto identchar = alnum | char_("-_.");
auto kw  = [](auto p) { return lexeme[p >> !identchar]; };
auto ikw = [](auto p) { return no_case[kw(p)]; };

auto const EOL = eol|eoi;

与您当前报告的情况相比,这已经使您处于一个更好的位置:

Live On Coliru

 ==== "circle at center, 1, 2, 3"
Parsed 1 shapes
shape:circle at center, 1, 2, 3
 ==== "at top, 1, 2, 3"
Parsed 1 shapes
shape:ellipse at top, 1, 2, 3
 ==== "at bottom"
Parsed 1 shapes
shape:ellipse at bottom
 ==== "1, 2, 3"
Parse failed
Remaining unparsed input: "1, 2, 3"
 ==== "my_fancy_shape at right, 1"
Parsed 1 shapes
custom:"my_fancy_shape" at right, 1
 ==== "circle at center, 1, 2, 3
               at top, 1, 2, 3
               at bottom
               circle, 1, 2, 3
               1, 2, 3
               my_fancy_shape at right, 1"
Parsed 4 shapes
shape:circle at center, 1, 2, 3
shape:ellipse at top, 1, 2, 3
shape:ellipse at bottom
shape:circle at top, 1, 2, 3
Remaining unparsed input: "
               1, 2, 3
               my_fancy_shape at right, 1"
 ==== "circle, 1, 2 3"
Parsed 1 shapes
shape:circle at top, 1, 2
Remaining unparsed input: " 3"
 ==== ", 1, 2, 3"
Parsed 1 shapes
shape:ellipse at top, 1, 2, 3
 ==== "circle 1, 2, 3"
Parse failed
Remaining unparsed input: "circle 1, 2, 3"

如您所见,最后三个未能按预期解析完整输入。然而,有一个你想成功,但没有:

 ==== "1, 2, 3"
Parse failed
Remaining unparsed input: "1, 2, 3"

黑客攻击

如果不编写解析器的爆炸式增长,这很难摆脱。请注意,在 shape positiondata 之间正确解析 ',' 的技巧是 ','|&EOL.

我们实际需要能够编写的是&BOL|','|&EOL。但是没有BOL这样的东西。让我们来效仿吧!

// hack for BOL state
struct state_t {
    bool at_bol = true;

    struct set_bol {
        template <typename Ctx> void operator()(Ctx& ctx) const {
            auto& s = get<state_t>(ctx);
            //std::clog << std::boolalpha << "set_bol (from " << s.at_bol << ")" << std::endl;
            s.at_bol = true;
        }
    };

    struct reset_bol {
        template <typename Ctx> void operator()(Ctx& ctx) const {
            auto& s = get<state_t>(ctx);
            //std::clog << std::boolalpha << "reset_bol (from " << s.at_bol << ")" << std::endl;
            s.at_bol = false;
        }
    };

    struct is_at_bol {
        template <typename Ctx> void operator()(Ctx& ctx) const {
            auto& s = get<state_t>(ctx);
            //std::clog << std::boolalpha << "is_at_bol (" << s.at_bol << ")" << std::endl;
            _pass(ctx) = s.at_bol;
        }
    };
};
auto const SET_BOL   = eps[ state_t::set_bol{} ];
auto const RESET_BOL = eps[ state_t::reset_bol{} ];
auto const AT_BOL    = eps[ state_t::is_at_bol{} ];

现在我们可以在这里和那里混合适当的 epsilons:

template <typename T>
auto opt = [](auto p, T defval = {}) {
    return as<T>(p >> RESET_BOL | attr(defval));
};

auto const shape_line = as<AST::result>(
        with<state_t>(state_t{}) [
            SET_BOL >>
            opt<AST::any_shape>(any_shape) >>
            opt<AST::position>(position) >>
            (AT_BOL|','|&EOL) >> -data
        ]);

它很丑,但它有效:

 ==== "circle at center, 1, 2, 3"
Parsed 1 shapes
shape:circle at center, 1, 2, 3
 ==== "at top, 1, 2, 3"
Parsed 1 shapes
shape:ellipse at top, 1, 2, 3
 ==== "at bottom"
Parsed 1 shapes
shape:ellipse at bottom
 ==== "1, 2, 3"
Parsed 1 shapes
shape:ellipse at top, 1, 2, 3
 ==== "my_fancy_shape at right, 1"
Parsed 1 shapes
custom:"my_fancy_shape" at right, 1
 ==== "circle at center, 1, 2, 3
               at top, 1, 2, 3
               at bottom
               circle, 1, 2, 3
               1, 2, 3
               my_fancy_shape at right, 1"
Parsed 6 shapes
shape:circle at center, 1, 2, 3
shape:ellipse at top, 1, 2, 3
shape:ellipse at bottom
shape:circle at top, 1, 2, 3
shape:ellipse at top, 1, 2, 3
custom:"my_fancy_shape" at right, 1
 ==== "circle, 1, 2 3"
Parsed 1 shapes
shape:circle at top, 1, 2
Remaining unparsed input: " 3"
 ==== ", 1, 2, 3"
Parsed 1 shapes
shape:ellipse at top
Remaining unparsed input: ", 1, 2, 3"
 ==== "circle 1, 2, 3"
Parse failed
Remaining unparsed input: "circle 1, 2, 3"

Oh, you might add eoi to the shapes parser rule so we get slightly less confusing output when partial input is parsed, but that's up to you to decide

完整演示

Live On Wandbox¹

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted.hpp>

#include <iostream>
#include <iomanip>
#include <variant>

namespace AST {
    enum class shape       { ellipse, circle                  } ;
    enum class other_shape { square, rectangle                } ;
    enum class position    { top, left, right, bottom, center } ;

    using any_shape = std::variant<shape, other_shape, std::string>;
    using data = std::vector<double>;

    struct result {
        any_shape bla;
        position  pos;
        data      bloe;
    };

    static inline std::ostream& operator<<(std::ostream& os, shape const& v) {
        switch(v) {
            case shape::circle:  return os << "circle";
            case shape::ellipse: return os << "ellipse";
        }
        throw std::domain_error("shape");
    }
    static inline std::ostream& operator<<(std::ostream& os, other_shape const& v) {
        switch(v) {
            case other_shape::rectangle: return os << "rectangle";
            case other_shape::square:    return os << "square";

        }
        throw std::domain_error("other_shape");
    }
    static inline std::ostream& operator<<(std::ostream& os, position const& v) {
        switch(v) {
            case position::top:    return os << "top";
            case position::left:   return os << "left";
            case position::right:  return os << "right";
            case position::bottom: return os << "bottom";
            case position::center: return os << "center";

        }
        throw std::domain_error("position");
    }

    template <typename... F> struct overloads : F... {
        overloads(F... f) : F(f)... {}
        using F::operator()...;
    };

    static inline std::ostream& operator<<(std::ostream& os, any_shape const& v) {
        std::visit(overloads{
            [&os](shape v)       { os << "shape:" << v;               },
            [&os](other_shape v) { os << "other_shape:" << v;         },
            [&os](auto const& v) { os << "custom:" << std::quoted(v); },
        }, v);
        return os;
    }
}

BOOST_FUSION_ADAPT_STRUCT(AST::result, bla, pos, bloe)

namespace parser {
    using namespace boost::spirit::x3;

    struct shape_t : symbols<AST::shape> {
        shape_t() { add
            ("ellipse", AST::shape::ellipse)
            ("circle", AST::shape::circle)
            ;
        }
    } shape_sym;

    struct other_shape_t : symbols<AST::other_shape> {
        other_shape_t() { add
            ("square", AST::other_shape::square)
            ("rectangle", AST::other_shape::rectangle)
            ;
        }
    } other_shape_sym;

    struct position_t : symbols<AST::position> {
        position_t() { add
            ("top", AST::position::top)
            ("left", AST::position::left)
            ("right", AST::position::right)
            ("bottom", AST::position::bottom)
            ("center", AST::position::center)
            ;
        }
    } position_sym;

    // hack for BOL state
    struct state_t {
        bool at_bol = true;

        struct set_bol {
            template <typename Ctx> void operator()(Ctx& ctx) const {
                auto& s = get<state_t>(ctx);
                //std::clog << std::boolalpha << "set_bol (from " << s.at_bol << ")" << std::endl;
                s.at_bol = true;
            }
        };

        struct reset_bol {
            template <typename Ctx> void operator()(Ctx& ctx) const {
                auto& s = get<state_t>(ctx);
                //std::clog << std::boolalpha << "reset_bol (from " << s.at_bol << ")" << std::endl;
                s.at_bol = false;
            }
        };

        struct is_at_bol {
            template <typename Ctx> void operator()(Ctx& ctx) const {
                auto& s = get<state_t>(ctx);
                //std::clog << std::boolalpha << "is_at_bol (" << s.at_bol << ")" << std::endl;
                _pass(ctx) = s.at_bol;
            }
        };
    };
    auto const SET_BOL   = eps[ state_t::set_bol{} ];
    auto const RESET_BOL = eps[ state_t::reset_bol{} ];
    auto const AT_BOL    = eps[ state_t::is_at_bol{} ];

    ////////////////
    // helpers - attribute coercion
    template <typename T>
    auto as  = [](auto p) {
        return rule<struct _, T, true> {typeid(T).name()} = p;
    };
    template <typename T>
    auto opt = [](auto p, T defval = {}) {
        return as<T>(p >> RESET_BOL | attr(defval));
    };

    // keyword boundary detection
    auto identchar = alnum | char_("-_.");
    auto kw  = [](auto p) { return lexeme[p >> !identchar]; };
    auto ikw = [](auto p) { return no_case[kw(p)]; };

    auto const EOL = eol|eoi;
    ////////////////

    auto const data = as<AST::data>(double_ % ',');
    auto const position = kw("at") >> position_sym;

    auto const custom_shape =
            !(position|data) >> as<std::string>(kw(+identchar));
    auto const any_shape = as<AST::any_shape>(
            ikw(shape_sym) | ikw(other_shape_sym) | custom_shape);

    auto const shape_line = as<AST::result>(
            with<state_t>(state_t{}) [
                SET_BOL >>
                opt<AST::any_shape>(any_shape) >>
                opt<AST::position>(position) >>
                (AT_BOL|','|&EOL) >> -data
            ]);
    auto const shapes = skip(blank) [ shape_line % eol ]/* >> eoi*/;
}

int main() {
    for (std::string const input : {
            "circle at center, 1, 2, 3",
            "at top, 1, 2, 3",
            "at bottom",
            "1, 2, 3",
            "my_fancy_shape at right, 1",
            R"(circle at center, 1, 2, 3
               at top, 1, 2, 3
               at bottom
               circle, 1, 2, 3
               1, 2, 3
               my_fancy_shape at right, 1)",

            // invalids:
            "circle, 1, 2 3",
            ", 1, 2, 3",
            "circle 1, 2, 3",
            })
    {
        std::cout << " ==== " << std::quoted(input) << std::endl;
        std::vector<AST::result> r;
        auto f = begin(input), l = end(input);
        if (parse(f, l, parser::shapes, r)) {
            std::cout << "Parsed " << r.size() << " shapes" << std::endl;
            for (auto const& s : r) {
                std::cout << s.bla << " at " << s.pos;
                for (auto v : s.bloe)
                    std::cout << ", " << v;
                std::cout << std::endl;
            }
        } else {
            std::cout << "Parse failed" << std::endl;
        }

        if (f!=l) {
            std::cout << "Remaining unparsed input: " << std::quoted(std::string(f,l)) << std::endl;
        }
    }
}

¹ Wandbox 具有比 Coliru 更新的 Boost 版本,使 with<> 指令状态按预期可变。