Spirit X3,这种错误处理方式有用吗?

Spirit X3, Is this error handling approach useful?

在阅读 error handling 上的 Spirit X3 教程和一些实验之后。我得出了一个结论。

我认为 X3 中的错误处理主题还有一些改进空间。从我的角度来看,一个重要的目标是提供有意义的错误消息。首先也是最重要的是,添加将 _pass(ctx) 成员设置为 false 的语义操作不会这样做,因为 X3 会尝试匹配其他内容。仅抛出 x3::expectation_failure 会过早退出解析函数,即不尝试匹配任何其他内容。所以剩下的就是解析器指令 expect[a] 和解析器 operator> 以及从语义操作中手动抛出 x3::expectation_failure 。我相信关于这个错误处理的词汇太有限了。请考虑以下 X3 PEG 语法行:

const auto a = a1 >> a2 >> a3;
const auto b = b1 >> b2 >> b3;
const auto c = c1 >> c2 >> c3;

const auto main_rule__def =
(
 a |
 b |
 c );

现在对于表达式 a 我不能使用 expect[]operator>,因为其他替代方法可能有效。我可能是错的,但我认为 X3 要求我拼出可以匹配的替代错误表达式,如果它们匹配,他们可以抛出 x3::expectation_failure,这很麻烦。

问题是,是否有一种好的方法可以使用当前的 X3 工具检查我的 PEG 构造中的错误条件以及 a、b 和 c 的有序替代项?

如果答案是否定的,我想提出我的想法,为此提供一个合理的解决方案。我相信为此我需要一个新的解析器指令。这个指令应该做什么?它应该在解析 失败 时调用附加的语义操作。该属性显然未使用,但我需要在第一次出现解析不匹配时将 _where 成员设置在迭代器位置上。所以如果a2失败,_where应该在a1结束后置1。让我们调用解析指令 neg_sa。这意味着否定语义动作。

pseudocode

// semantic actions
auto a_sa = [&](auto& ctx)
{
  // add _where to vector v
};

auto b_sa = [&](auto& ctx)
{
  // add _where to vector v
};

auto c_sa = [&](auto& ctx)
{
  // add _where to vector v

  // now we know we have a *real* error.
  // find the peak iterator value in the vector v
  // the position tells whether it belongs to a, b or c.
  // now we can formulate an error message like: “cannot make sense of b upto this position.”
  // lastly throw x3::expectation_failure
};

// PEG
const auto a = a1 >> a2 >> a3;
const auto b = b1 >> b2 >> b3;
const auto c = c1 >> c2 >> c3;

const auto main_rule__def =
(
 neg_sa[a][a_sa] |
 neg_sa[b][b_sa] |
 neg_sa[c][c_sa] );

我希望我清楚地表达了这个想法。如果我需要进一步解释,请在评论部分告诉我。

Now for expression a I cannot use expect[] or operator>, as other alternatives might be valid. I could be wrong but I think X3 requires me to spell out alternate wrong expressions that can match and if they match they can throw x3::expectation_failure which is cumbersome.

很简单:

const auto main_rule__def = x3::expect [
 a |
 b |
 c ];

或者,甚至:

const auto main_rule__def = x3::eps > (
 a |
 b |
 c );

If the answer is no, I would like to present my idea to provide a reasonable solution for this. I believe I would need a new parser directive for that. What should this directive do? It should call the attached semantic action when the parse fails instead.

现有的 x3::on_error 功能已经知道如何执行此操作。请注意:它有点复杂,但同样也非常灵活。

基本上,它需要您在 ID 类型上实现一个静态接口(x3::rule<ID, Attr>,可能 main_rule_class 在您选择的约定中)。存储库中有编译器示例,展示了如何使用它。

Side note: there's both on_success and on_error using this paradigm

on_error 成员将在 ID 类型的 default-constructed 副本上调用,参数为 ID().on_error(first, last, expectation_failure_object, context)

const auto main_rule__def =
(
 neg_sa[a][a_sa] |
 neg_sa[b][b_sa] |
 neg_sa[c][c_sa] );

老实说,我认为您是在为自己的困惑铺平道路。有 3 个独立的错误操作有什么好处?您如何确定发生了哪个错误?

真的只有两种可能:

  • 要么你确实知道需要一个特定的分支并且它失败了(这是一个期望失败,你可以根据定义编码作为[=19中的一个期望点=]、bc).
  • 或者您不知道隐含了哪个分支(例如,分支可以从类似的作品开始,但在其中失败了)。在那种情况下,没有人能知道应该调用哪个错误处理程序,所以有多个错误处理程序是无关紧要的。

    实际上正确的做法是在更高级别上使 main_rule 失败,这意味着 "none of the possible branches succeeded"。

    这是expect[ a | b | c ]的处理方式。

好吧,冒着把太多东西混为一谈的风险,这里是:

namespace square::peg {
    using namespace x3;

    const auto quoted_string = lexeme['"' > *(print - '"') > '"'];
    const auto bare_string   = lexeme[alpha > *alnum] > ';';
    const auto two_ints      = int_ > int_;

    const auto main          = quoted_string | bare_string | two_ints;

    const auto entry_point   = skip(space)[ expect[main] > eoi ];
} // namespace square::peg

应该可以。关键是唯一该期待的东西 points 是使相应分支失败的原因 无疑是正确的分支。 (否则,字面上不会有 很难期望).

有两个次要的 get_info 更漂亮消息的专业化¹,这可能会导致 即使在手动捕获异常时也能显示正确的错误消息:

Live On Coliru

int main() {
    using It = std::string::const_iterator;

    for (std::string const input : {
            "   -89 0038  ",
            "   \"-89 0038\"  ",
            "   something123123      ;",
            // undecidable
            "",
            // violate expecations, no successful parse
            "   -89 oops  ",   // not an integer
            "   \"-89 0038  ", // missing "
            "   bareword ",    // missing ;
            // trailing debris, successful "main"
            "   -89 3.14  ",   // followed by .14
        })
    {
        std::cout << "====== " << std::quoted(input) << "\n";

        It iter = input.begin(), end = input.end();
        try {
        if (parse(iter, end, square::peg::entry_point)) {
            std::cout << "Parsed successfully\n";
        } else {
            std::cout << "Parsing failed\n";
        }
        } catch (x3::expectation_failure<It> const& ef) {
            auto pos = std::distance(input.begin(), ef.where());
            std::cout << "Expect " << ef.which() << " at "
                << "\n\t" << input
                << "\n\t" << std::setw(pos) << std::setfill('-') << "" << "^\n";
        }
    }
}

版画

====== "   -89 0038  "
Parsed successfully
====== "   \"-89 0038\"  "
Parsed successfully
====== "   something123123      ;"
Parsed successfully
====== ""
Expect quoted string, bare string or integer number pair at

    ^
====== "   -89 oops  "
Expect integral number at
       -89 oops 
    -------^
====== "   \"-89 0038  "
Expect '"' at
       "-89 0038 
    --------------^
====== "   bareword "
Expect ';' at
       bareword
    ------------^
====== "   -89 3.14  "
Expect eoi at
       -89 3.14 
    --------^

这已经超出了大多数人对解析器的期望。

但是:自动化,而且更灵活

我们可能不满足于只报告一个期望和救助。实际上,您可以报告并继续解析,因为只是存在常规不匹配:这就是 on_error 的用武之地。

让我们创建一个标签库:

struct with_error_handling {
    template<typename It, typename Ctx>
        x3::error_handler_result on_error(It f, It l, expectation_failure<It> const& ef, Ctx const&) const {
            std::string s(f,l);
            auto pos = std::distance(f, ef.where());

            std::cout << "Expecting " << ef.which() << " at "
                << "\n\t" << s
                << "\n\t" << std::setw(pos) << std::setfill('-') << "" << "^\n";

            return error_handler_result::fail;
        }
};

现在,我们所要做的就是从 with_error_handling 和 BAM 中导出我们的规则 ID!我们不必编写任何异常处理程序,规则将简单地 "fail" 进行适当的诊断.更重要的是,一些输入可以导致多个(希望有用的)诊断:

auto const eh = [](auto p) {
    struct _ : with_error_handling {};
    return rule<_> {} = p;
};

const auto quoted_string = eh(lexeme['"' > *(print - '"') > '"']);
const auto bare_string   = eh(lexeme[alpha > *alnum] > ';');
const auto two_ints      = eh(int_ > int_);

const auto main          = quoted_string | bare_string | two_ints;
using main_type = std::remove_cv_t<decltype(main)>;

const auto entry_point   = skip(space)[ eh(expect[main] > eoi) ];

现在,main 变成:

Live On Coliru

for (std::string const input : { 
        "   -89 0038  ",
        "   \"-89 0038\"  ",
        "   something123123      ;",
        // undecidable
        "",
        // violate expecations, no successful parse
        "   -89 oops  ",   // not an integer
        "   \"-89 0038  ", // missing "
        "   bareword ",    // missing ;
        // trailing debris, successful "main"
        "   -89 3.14  ",   // followed by .14
    })
{
    std::cout << "====== " << std::quoted(input) << "\n";

    It iter = input.begin(), end = input.end();
    if (parse(iter, end, square::peg::entry_point)) {
        std::cout << "Parsed successfully\n";
    } else {
        std::cout << "Parsing failed\n";
    }
}

程序打印:

====== "   -89 0038  "
Parsed successfully
====== "   \"-89 0038\"  "
Parsed successfully
====== "   something123123      ;"
Parsed successfully
====== ""
Expecting quoted string, bare string or integer number pair at 

    ^
Parsing failed
====== "   -89 oops  "
Expecting integral number at 
       -89 oops  
    -------^
Expecting quoted string, bare string or integer number pair at 
       -89 oops  
    ^
Parsing failed
====== "   \"-89 0038  "
Expecting '"' at 
       "-89 0038  
    --------------^
Expecting quoted string, bare string or integer number pair at 
       "-89 0038  
    ^
Parsing failed
====== "   bareword "
Expecting ';' at 
       bareword 
    ------------^
Expecting quoted string, bare string or integer number pair at 
       bareword 
    ^
Parsing failed
====== "   -89 3.14  "
Expecting eoi at 
       -89 3.14  
    --------^
Parsing failed

属性传播,on_success

当解析器实际上不解析任何东西时,它们就不是很有用,所以让我们添加一些建设性的值处理,同时展示 on_success:

定义一些 AST 类型来接收属性:

struct quoted : std::string {};
struct bare   : std::string {};
using  two_i  = std::pair<int, int>;
using Value = boost::variant<quoted, bare, two_i>;

确保我们可以打印 Values:

static inline std::ostream& operator<<(std::ostream& os, Value const& v) {
    struct {
        std::ostream& _os;
        void operator()(quoted const& v) const { _os << "quoted(" << std::quoted(v) << ")";             } 
        void operator()(bare const& v) const   { _os << "bare(" << v << ")";                            } 
        void operator()(two_i const& v) const  { _os << "two_i(" << v.first << ", " << v.second << ")"; } 
    } vis{os};

    boost::apply_visitor(vis, v);
    return os;
}

现在,使用旧的 as<> 技巧来强制属性类型,这次使用 error-handling:

作为锦上添花,让我们在with_error_handling中演示on_success

    template<typename It, typename Ctx>
        void on_success(It f, It l, two_i const& v, Ctx const&) const {
            std::cout << "Parsed " << std::quoted(std::string(f,l)) << " as integer pair " << v.first << ", " << v.second << "\n";
        }

现在主要未修改主程序(也只打印结果值):

Live On Coliru

    It iter = input.begin(), end = input.end();
    Value v;
    if (parse(iter, end, square::peg::entry_point, v)) {
        std::cout << "Result value: " << v << "\n";
    } else {
        std::cout << "Parsing failed\n";
    }

版画

====== "   -89 0038  "
Parsed "-89 0038" as integer pair -89, 38
Result value: two_i(-89, 38)
====== "   \"-89 0038\"  "
Result value: quoted("-89 0038")
====== "   something123123      ;"
Result value: bare(something123123)
====== ""
Expecting quoted string, bare string or integer number pair at 

    ^
Parsing failed
====== "   -89 oops  "
Expecting integral number at 
       -89 oops  
    -------^
Expecting quoted string, bare string or integer number pair at 
       -89 oops  
    ^
Parsing failed
====== "   \"-89 0038  "
Expecting '"' at 
       "-89 0038  
    --------------^
Expecting quoted string, bare string or integer number pair at 
       "-89 0038  
    ^
Parsing failed
====== "   bareword "
Expecting ';' at 
       bareword 
    ------------^
Expecting quoted string, bare string or integer number pair at 
       bareword 
    ^
Parsing failed
====== "   -89 3.14  "
Parsed "-89 3" as integer pair -89, 3
Expecting eoi at 
       -89 3.14  
    --------^
Parsing failed

实在是太过分了

我不了解你,但我讨厌这样做 side-effects,更不用说从解析器打印到控制台了。让我们改用 x3::with

我们想通过 Ctx& 参数附加到诊断而不是写入 到 on_error 处理程序中的 std::cout

struct with_error_handling {
    struct diags;

    template<typename It, typename Ctx>
        x3::error_handler_result on_error(It f, It l, expectation_failure<It> const& ef, Ctx const& ctx) const {
            std::string s(f,l);
            auto pos = std::distance(f, ef.where());

            std::ostringstream oss;
            oss << "Expecting " << ef.which() << " at "
                << "\n\t" << s
                << "\n\t" << std::setw(pos) << std::setfill('-') << "" << "^";

            x3::get<diags>(ctx).push_back(oss.str());

            return error_handler_result::fail;
        }
};

并且在调用站点上,我们可以传递上下文:

std::vector<std::string> diags;

if (parse(iter, end, x3::with<D>(diags) [square::peg::entry_point], v)) {
    std::cout << "Result value: " << v;
} else {
    std::cout << "Parsing failed";
}

std::cout << " with " << diags.size() << " diagnostics messages: \n";

完整程序还打印诊断信息:

Live On Wandbox²

完整列表

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>

namespace x3 = boost::spirit::x3;

struct quoted : std::string {};
struct bare   : std::string {};
using  two_i  = std::pair<int, int>;
using Value = boost::variant<quoted, bare, two_i>;

static inline std::ostream& operator<<(std::ostream& os, Value const& v) {
    struct {
        std::ostream& _os;
        void operator()(quoted const& v) const { _os << "quoted(" << std::quoted(v) << ")";             } 
        void operator()(bare const& v) const   { _os << "bare(" << v << ")";                            } 
        void operator()(two_i const& v) const  { _os << "two_i(" << v.first << ", " << v.second << ")"; } 
    } vis{os};

    boost::apply_visitor(vis, v);
    return os;
}

namespace square::peg {
    using namespace x3;

    struct with_error_handling {
        struct diags;

        template<typename It, typename Ctx>
            x3::error_handler_result on_error(It f, It l, expectation_failure<It> const& ef, Ctx const& ctx) const {
                std::string s(f,l);
                auto pos = std::distance(f, ef.where());

                std::ostringstream oss;
                oss << "Expecting " << ef.which() << " at "
                    << "\n\t" << s
                    << "\n\t" << std::setw(pos) << std::setfill('-') << "" << "^";

                x3::get<diags>(ctx).push_back(oss.str());

                return error_handler_result::fail;
            }
    };

    template <typename T = x3::unused_type> auto const as = [](auto p) {
        struct _ : with_error_handling {};
        return rule<_, T> {} = p;
    };

    const auto quoted_string = as<quoted>(lexeme['"' > *(print - '"') > '"']);
    const auto bare_string   = as<bare>(lexeme[alpha > *alnum] > ';');
    const auto two_ints      = as<two_i>(int_ > int_);

    const auto main          = quoted_string | bare_string | two_ints;
    using main_type = std::remove_cv_t<decltype(main)>;

    const auto entry_point   = skip(space)[ as<Value>(expect[main] > eoi) ];
} // namespace square::peg

namespace boost::spirit::x3 {
    template <> struct get_info<int_type> {
        typedef std::string result_type;
        std::string operator()(int_type const&) const { return "integral number"; }
    };
    template <> struct get_info<square::peg::main_type> {
        typedef std::string result_type;
        std::string operator()(square::peg::main_type const&) const { return "quoted string, bare string or integer number pair"; }
    };
}

int main() {
    using It = std::string::const_iterator;
    using D = square::peg::with_error_handling::diags;

    for (std::string const input : { 
            "   -89 0038  ",
            "   \"-89 0038\"  ",
            "   something123123      ;",
            // undecidable
            "",
            // violate expecations, no successful parse
            "   -89 oops  ",   // not an integer
            "   \"-89 0038  ", // missing "
            "   bareword ",    // missing ;
            // trailing debris, successful "main"
            "   -89 3.14  ",   // followed by .14
        })
    {
        std::cout << "====== " << std::quoted(input) << "\n";

        It iter = input.begin(), end = input.end();
        Value v;
        std::vector<std::string> diags;

        if (parse(iter, end, x3::with<D>(diags) [square::peg::entry_point], v)) {
            std::cout << "Result value: " << v;
        } else {
            std::cout << "Parsing failed";
        }

        std::cout << " with " << diags.size() << " diagnostics messages: \n";

        for(auto& msg: diags) {
            std::cout << " - " << msg << "\n";
        }
    }
}

¹ 您可以改用带有名称的规则,避免这个更复杂的技巧

² 在旧版本的库中,您可能需要努力获取 with<> 数据的引用语义:Live On Coliru