如何在使用 Alternative Parser 将输入分解为一系列不同部分时保持 space 字符？

Question

我想编写一个简单的 C++ 解析器来提取块层次结构。我正在使用这条规则：

std::string rest_content;
std::vector<boost::tuple<std::string, std::string>> scopes;
qi::rule<It, qi::ascii::space_type> block = *(
            r.comment
        |   r.scope[push_back(boost::phoenix::ref(scopes), _1)]
        |   qi::char_[boost::phoenix::ref(rest_content) += _1] // rest
    );

qi::phrase_parse(first, last,
        block,
        ascii::space);

应该将代码分解为三个部分：注释、作用域（用“{}”包围的代码）和"rest"。问题是所有 space 个字符都从 "rest" 中删除了。我需要那些 spaces 用于以后的解析（例如提取标识符）。

我尝试使用 qi::skip、qi::lexeme 和 qi::raw 来保持 spaces:

// one of many failed attempts
qi::rule<It, qi::ascii::space_type> block = qi::lexeme[*(
            qi::skip[r.comment]
        |   qi::skip[r.scope[push_back(boost::phoenix::ref(scopes), _1)]]
        |   qi::char_[push_back(boost::phoenix::ref(rest_content), _1)]
    )];

但它永远行不通。

那么如何保留space个字符呢？欢迎任何帮助。谢谢。

Answer 1

如果您以这种方式解析 C++ 代码，您可能会贪多嚼不烂。

我会回答，但答案应该会告诉您这种方法的局限性。想象一下通过

解析

namespace q::x {
    namespace y {
        struct A {
            template <typename = ns1::c<int>, typename...> struct C;
        };

        template <typename T, typename... Ts>
        struct A::C final : ns2::ns3::base<A::C<T, Ts...>, Ts...> {
             int simple = [](...) {
                  enum class X : unsigned { answer = 42, };
                  struct {
                      auto operator()(...) -> decltype(auto) {
                           return static_cast<int>(X::answer);
                      } 
                  } iife;
                  return iife();
             }("/* }}} */"); // {{{
        };
    }
}

并且做对了。是的。 that's valid code.

In fact it's so tricky, that it's easy to make "grown compilers" (GCC) trip: https://wandbox.org/permlink/FzcaSl6tbn18jq4f (Clang has no issue: https://wandbox.org/permlink/wu0mFwQiTOogKB5L).

也就是说，让我参考一下我以前对规则声明和船长如何协同工作的解释：Boost spirit skipper issues

并显示我将要执行的操作的近似值。

qi::rule<It, std::string()> type
    = ( qi::string("struct") 
      | qi::string("class") 
      | qi::string("union") 
      | qi::string("enum") >> -(*comment_or_ws >> qi::string("class"))
      | qi::string("namespace") 
      )
    >> !qi::graph // must be followed by whitespace
    ;

qi::rule<It, std::string()> identifier = 
    qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9")
    ;

我/猜想/ struct X { }; 将是一个 "scope" 的示例，元组将包含 ("struct", "X").

As a bonus I used attribute adaption of std::pair and show how to insert into a multimap for good measure later on

qi::rule<It, std::pair<std::string, std::string>()> scope
    = qi::skip(comment_or_ws.alias()) [
        type >> identifier
        >> *~qi::char_(";{") // ignore some stuff like base classes
        >> qi::omit["{" >> *~qi::char_("}") >> "}" | ';']
    ];

Note a big short-coming here is that the first non-commented '}' will "end" the scope. That's not how the language works (see the leading example)

现在我们可以得出一个改进的 "block" 规则：

qi::rule<It, SkipRule> block 
    = *(
        scope [px::insert(px::ref(scopes), _1)]
      | qi::skip(comment_only.alias()) [ 
            qi::as_string[qi::raw[+(qi::char_ - scope)]] [px::ref(rest_content) += _1]
      ] // rest
    );

Note that - we override the comment_or_ws skipper with comment_only so we don't drop all whitespace from "rest content" - inversely, we override the skipper to include whitespace inside the scope rule because otherwise the negative scope invocation (char_ - scope) would do the wrong thing because it wouldn't skip whitespace

完整演示

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;

int main() {
    using It = std::string::const_iterator;
    using namespace qi::labels;

    std::string rest_content;
    std::multimap<std::string, std::string> scopes;

    using SkipRule = qi::rule<It>;
    SkipRule comment_only 
        = "//" >> *~qi::char_("\r\n") >> qi::eol
        | "/*" >> *(qi::char_ - "*/") >> "*/"
        ;

    SkipRule comment_or_ws
        = qi::space | comment_only;

    qi::rule<It, std::string()> type
        = ( qi::string("struct") 
          | qi::string("class") 
          | qi::string("union") 
          | qi::string("enum") >> -(*comment_or_ws >> qi::string("class"))
          | qi::string("namespace") 
          )
        >> !qi::graph // must be followed by whitespace
        ;

    qi::rule<It, std::string()> identifier = 
        qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9")
        ;

    qi::rule<It, std::pair<std::string, std::string>()> scope
        = qi::skip(comment_or_ws.alias()) [
            type >> identifier
            >> *~qi::char_(";{") // ignore some stuff like base classes
            >> qi::omit["{" >> *~qi::char_("}") >> "}" | ';']
        ];

    qi::rule<It, SkipRule> block 
        = *(
            scope [px::insert(px::ref(scopes), _1)]
          | qi::skip(comment_only.alias()) [ 
                qi::as_string[qi::raw[+(qi::char_ - scope)]] [px::ref(rest_content) += _1]
          ] // rest
        );

    //BOOST_SPIRIT_DEBUG_NODES((block)(scope)(identifier)(type))

    std::string const code = R"(
// some random sample "code"
struct base { 
    std::vector<int> ints;
};
/* class skipped_comment : base { };
 */

namespace q { namespace nested { } } // nested is not supported

class forward_declared;

template <typename T> // actually basically ignored
class
        Derived 
: base {
            std::string more_data_members;
};

enum class MyEnum : int32_t {
    foo = 0,
    bar, /* whoop } */
    qux = foo + bar
};

int main() {
    return 0;
}
            )";

    qi::phrase_parse(begin(code), end(code), block, comment_or_ws);

    for (auto& [k,v] : scopes) {
        std::cout << k << ": " << v << "\n";
    }

    std::cout << "------------------ BEGIN REST_CONTENT -----------------\n";
    std::cout << rest_content << "\n";
    std::cout << "------------------ END REST_CONENT --------------------\n";
}

解析以下示例输入：

// some random sample "code"
struct base { 
    std::vector<int> ints;
};
/* class skipped_comment : base { };
 */

namespace q { namespace nested { } } // nested is not supported

class forward_declared;

template <typename T> // actually basically ignored
class
        Derived 
: base {
            std::string more_data_members;
};

enum class MyEnum : int32_t {
    foo = 0,
    bar, /* whoop } */
    qux = foo + bar
};

int main() {
    return 0;
}

打印

class: forward_declared
class: Derived
enumclass: MyEnum
namespace: q
struct: base
------------------ BEGIN REST_CONTENT -----------------
;}template <typename T>;;

int main() {
    return 0;
}

------------------ END REST_CONENT --------------------

结论

这个结果似乎是指向

的一个不错的指针

解释如何解决特定障碍
演示这种解析方法如何在最轻微的障碍下崩溃（例如namespace a { namespace b { } }）

买者自负

如何在使用 Alternative Parser 将输入分解为一系列不同部分时保持 space 字符？

How to keep space character on breaking down input into a sequence of different parts using Alternative Parser?

c++

boost-spirit-qi

评论

完整演示

结论