如何在使用 Alternative Parser 将输入分解为一系列不同部分时保持 space 字符?
How to keep space character on breaking down input into a sequence of different parts using Alternative Parser?
我想编写一个简单的 C++ 解析器来提取块层次结构。我正在使用这条规则:
std::string rest_content;
std::vector<boost::tuple<std::string, std::string>> scopes;
qi::rule<It, qi::ascii::space_type> block = *(
r.comment
| r.scope[push_back(boost::phoenix::ref(scopes), _1)]
| qi::char_[boost::phoenix::ref(rest_content) += _1] // rest
);
qi::phrase_parse(first, last,
block,
ascii::space);
应该将代码分解为三个部分:注释、作用域(用“{}”包围的代码)和"rest"。
问题是所有 space 个字符都从 "rest" 中删除了。我需要那些 spaces 用于以后的解析(例如提取标识符)。
我尝试使用 qi::skip
、qi::lexeme
和 qi::raw
来保持 spaces:
// one of many failed attempts
qi::rule<It, qi::ascii::space_type> block = qi::lexeme[*(
qi::skip[r.comment]
| qi::skip[r.scope[push_back(boost::phoenix::ref(scopes), _1)]]
| qi::char_[push_back(boost::phoenix::ref(rest_content), _1)]
)];
但它永远行不通。
那么如何保留space个字符呢?欢迎任何帮助。谢谢。
如果您以这种方式解析 C++ 代码,您可能会贪多嚼不烂。
我会回答,但答案应该会告诉您这种方法的局限性。想象一下通过
解析
namespace q::x {
namespace y {
struct A {
template <typename = ns1::c<int>, typename...> struct C;
};
template <typename T, typename... Ts>
struct A::C final : ns2::ns3::base<A::C<T, Ts...>, Ts...> {
int simple = [](...) {
enum class X : unsigned { answer = 42, };
struct {
auto operator()(...) -> decltype(auto) {
return static_cast<int>(X::answer);
}
} iife;
return iife();
}("/* }}} */"); // {{{
};
}
}
并且做对了。是的。 that's valid code.
In fact it's so tricky, that it's easy to make "grown compilers" (GCC) trip: https://wandbox.org/permlink/FzcaSl6tbn18jq4f (Clang has no issue: https://wandbox.org/permlink/wu0mFwQiTOogKB5L).
也就是说,让我参考一下我以前对规则声明和船长如何协同工作的解释:Boost spirit skipper issues
并显示我将要执行的操作的近似值。
评论
实际上,评论应该是您的船长的一部分,所以让我们这样做:
using SkipRule = qi::rule<It>;
SkipRule comment_only
= "//" >> *~qi::char_("\r\n") >> qi::eol
| "/*" >> *(qi::char_ - "*/") >> "*/"
;
现在对于一般跳过,我们要包括空格:
SkipRule comment_or_ws
= qi::space | comment_only;
现在我们要解析类型和标识符:
qi::rule<It, std::string()> type
= ( qi::string("struct")
| qi::string("class")
| qi::string("union")
| qi::string("enum") >> -(*comment_or_ws >> qi::string("class"))
| qi::string("namespace")
)
>> !qi::graph // must be followed by whitespace
;
qi::rule<It, std::string()> identifier =
qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9")
;
我/猜想/ struct X { };
将是一个 "scope" 的示例,元组将包含 ("struct", "X")
.
As a bonus I used attribute adaption of std::pair
and show how to insert into a multimap
for good measure later on
qi::rule<It, std::pair<std::string, std::string>()> scope
= qi::skip(comment_or_ws.alias()) [
type >> identifier
>> *~qi::char_(";{") // ignore some stuff like base classes
>> qi::omit["{" >> *~qi::char_("}") >> "}" | ';']
];
Note a big short-coming here is that the first non-commented '}'
will "end" the scope. That's not how the language works (see the leading example)
现在我们可以得出一个改进的 "block" 规则:
qi::rule<It, SkipRule> block
= *(
scope [px::insert(px::ref(scopes), _1)]
| qi::skip(comment_only.alias()) [
qi::as_string[qi::raw[+(qi::char_ - scope)]] [px::ref(rest_content) += _1]
] // rest
);
Note that
- we override the comment_or_ws
skipper with comment_only
so we don't drop all whitespace from "rest content"
- inversely, we override the skipper to include whitespace inside the scope
rule because otherwise the negative scope
invocation (char_ - scope
) would do the wrong thing because it wouldn't skip whitespace
完整演示
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
int main() {
using It = std::string::const_iterator;
using namespace qi::labels;
std::string rest_content;
std::multimap<std::string, std::string> scopes;
using SkipRule = qi::rule<It>;
SkipRule comment_only
= "//" >> *~qi::char_("\r\n") >> qi::eol
| "/*" >> *(qi::char_ - "*/") >> "*/"
;
SkipRule comment_or_ws
= qi::space | comment_only;
qi::rule<It, std::string()> type
= ( qi::string("struct")
| qi::string("class")
| qi::string("union")
| qi::string("enum") >> -(*comment_or_ws >> qi::string("class"))
| qi::string("namespace")
)
>> !qi::graph // must be followed by whitespace
;
qi::rule<It, std::string()> identifier =
qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9")
;
qi::rule<It, std::pair<std::string, std::string>()> scope
= qi::skip(comment_or_ws.alias()) [
type >> identifier
>> *~qi::char_(";{") // ignore some stuff like base classes
>> qi::omit["{" >> *~qi::char_("}") >> "}" | ';']
];
qi::rule<It, SkipRule> block
= *(
scope [px::insert(px::ref(scopes), _1)]
| qi::skip(comment_only.alias()) [
qi::as_string[qi::raw[+(qi::char_ - scope)]] [px::ref(rest_content) += _1]
] // rest
);
//BOOST_SPIRIT_DEBUG_NODES((block)(scope)(identifier)(type))
std::string const code = R"(
// some random sample "code"
struct base {
std::vector<int> ints;
};
/* class skipped_comment : base { };
*/
namespace q { namespace nested { } } // nested is not supported
class forward_declared;
template <typename T> // actually basically ignored
class
Derived
: base {
std::string more_data_members;
};
enum class MyEnum : int32_t {
foo = 0,
bar, /* whoop } */
qux = foo + bar
};
int main() {
return 0;
}
)";
qi::phrase_parse(begin(code), end(code), block, comment_or_ws);
for (auto& [k,v] : scopes) {
std::cout << k << ": " << v << "\n";
}
std::cout << "------------------ BEGIN REST_CONTENT -----------------\n";
std::cout << rest_content << "\n";
std::cout << "------------------ END REST_CONENT --------------------\n";
}
解析以下示例输入:
// some random sample "code"
struct base {
std::vector<int> ints;
};
/* class skipped_comment : base { };
*/
namespace q { namespace nested { } } // nested is not supported
class forward_declared;
template <typename T> // actually basically ignored
class
Derived
: base {
std::string more_data_members;
};
enum class MyEnum : int32_t {
foo = 0,
bar, /* whoop } */
qux = foo + bar
};
int main() {
return 0;
}
打印
class: forward_declared
class: Derived
enumclass: MyEnum
namespace: q
struct: base
------------------ BEGIN REST_CONTENT -----------------
;}template <typename T>;;
int main() {
return 0;
}
------------------ END REST_CONENT --------------------
结论
这个结果似乎是指向
的一个不错的指针
- 解释如何解决特定障碍
- 演示这种解析方法如何在最轻微的障碍下崩溃(例如
namespace a { namespace b { } }
)
买者自负
我想编写一个简单的 C++ 解析器来提取块层次结构。我正在使用这条规则:
std::string rest_content;
std::vector<boost::tuple<std::string, std::string>> scopes;
qi::rule<It, qi::ascii::space_type> block = *(
r.comment
| r.scope[push_back(boost::phoenix::ref(scopes), _1)]
| qi::char_[boost::phoenix::ref(rest_content) += _1] // rest
);
qi::phrase_parse(first, last,
block,
ascii::space);
应该将代码分解为三个部分:注释、作用域(用“{}”包围的代码)和"rest"。 问题是所有 space 个字符都从 "rest" 中删除了。我需要那些 spaces 用于以后的解析(例如提取标识符)。
我尝试使用 qi::skip
、qi::lexeme
和 qi::raw
来保持 spaces:
// one of many failed attempts
qi::rule<It, qi::ascii::space_type> block = qi::lexeme[*(
qi::skip[r.comment]
| qi::skip[r.scope[push_back(boost::phoenix::ref(scopes), _1)]]
| qi::char_[push_back(boost::phoenix::ref(rest_content), _1)]
)];
但它永远行不通。
那么如何保留space个字符呢?欢迎任何帮助。谢谢。
如果您以这种方式解析 C++ 代码,您可能会贪多嚼不烂。
我会回答,但答案应该会告诉您这种方法的局限性。想象一下通过
解析namespace q::x {
namespace y {
struct A {
template <typename = ns1::c<int>, typename...> struct C;
};
template <typename T, typename... Ts>
struct A::C final : ns2::ns3::base<A::C<T, Ts...>, Ts...> {
int simple = [](...) {
enum class X : unsigned { answer = 42, };
struct {
auto operator()(...) -> decltype(auto) {
return static_cast<int>(X::answer);
}
} iife;
return iife();
}("/* }}} */"); // {{{
};
}
}
并且做对了。是的。 that's valid code.
In fact it's so tricky, that it's easy to make "grown compilers" (GCC) trip: https://wandbox.org/permlink/FzcaSl6tbn18jq4f (Clang has no issue: https://wandbox.org/permlink/wu0mFwQiTOogKB5L).
也就是说,让我参考一下我以前对规则声明和船长如何协同工作的解释:Boost spirit skipper issues
并显示我将要执行的操作的近似值。
评论
实际上,评论应该是您的船长的一部分,所以让我们这样做:
using SkipRule = qi::rule<It>;
SkipRule comment_only
= "//" >> *~qi::char_("\r\n") >> qi::eol
| "/*" >> *(qi::char_ - "*/") >> "*/"
;
现在对于一般跳过,我们要包括空格:
SkipRule comment_or_ws
= qi::space | comment_only;
现在我们要解析类型和标识符:
qi::rule<It, std::string()> type
= ( qi::string("struct")
| qi::string("class")
| qi::string("union")
| qi::string("enum") >> -(*comment_or_ws >> qi::string("class"))
| qi::string("namespace")
)
>> !qi::graph // must be followed by whitespace
;
qi::rule<It, std::string()> identifier =
qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9")
;
我/猜想/ struct X { };
将是一个 "scope" 的示例,元组将包含 ("struct", "X")
.
As a bonus I used attribute adaption of
std::pair
and show how to insert into amultimap
for good measure later on
qi::rule<It, std::pair<std::string, std::string>()> scope
= qi::skip(comment_or_ws.alias()) [
type >> identifier
>> *~qi::char_(";{") // ignore some stuff like base classes
>> qi::omit["{" >> *~qi::char_("}") >> "}" | ';']
];
Note a big short-coming here is that the first non-commented
'}'
will "end" the scope. That's not how the language works (see the leading example)
现在我们可以得出一个改进的 "block" 规则:
qi::rule<It, SkipRule> block
= *(
scope [px::insert(px::ref(scopes), _1)]
| qi::skip(comment_only.alias()) [
qi::as_string[qi::raw[+(qi::char_ - scope)]] [px::ref(rest_content) += _1]
] // rest
);
Note that - we override the
comment_or_ws
skipper withcomment_only
so we don't drop all whitespace from "rest content" - inversely, we override the skipper to include whitespace inside thescope
rule because otherwise the negativescope
invocation (char_ - scope
) would do the wrong thing because it wouldn't skip whitespace
完整演示
//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
int main() {
using It = std::string::const_iterator;
using namespace qi::labels;
std::string rest_content;
std::multimap<std::string, std::string> scopes;
using SkipRule = qi::rule<It>;
SkipRule comment_only
= "//" >> *~qi::char_("\r\n") >> qi::eol
| "/*" >> *(qi::char_ - "*/") >> "*/"
;
SkipRule comment_or_ws
= qi::space | comment_only;
qi::rule<It, std::string()> type
= ( qi::string("struct")
| qi::string("class")
| qi::string("union")
| qi::string("enum") >> -(*comment_or_ws >> qi::string("class"))
| qi::string("namespace")
)
>> !qi::graph // must be followed by whitespace
;
qi::rule<It, std::string()> identifier =
qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9")
;
qi::rule<It, std::pair<std::string, std::string>()> scope
= qi::skip(comment_or_ws.alias()) [
type >> identifier
>> *~qi::char_(";{") // ignore some stuff like base classes
>> qi::omit["{" >> *~qi::char_("}") >> "}" | ';']
];
qi::rule<It, SkipRule> block
= *(
scope [px::insert(px::ref(scopes), _1)]
| qi::skip(comment_only.alias()) [
qi::as_string[qi::raw[+(qi::char_ - scope)]] [px::ref(rest_content) += _1]
] // rest
);
//BOOST_SPIRIT_DEBUG_NODES((block)(scope)(identifier)(type))
std::string const code = R"(
// some random sample "code"
struct base {
std::vector<int> ints;
};
/* class skipped_comment : base { };
*/
namespace q { namespace nested { } } // nested is not supported
class forward_declared;
template <typename T> // actually basically ignored
class
Derived
: base {
std::string more_data_members;
};
enum class MyEnum : int32_t {
foo = 0,
bar, /* whoop } */
qux = foo + bar
};
int main() {
return 0;
}
)";
qi::phrase_parse(begin(code), end(code), block, comment_or_ws);
for (auto& [k,v] : scopes) {
std::cout << k << ": " << v << "\n";
}
std::cout << "------------------ BEGIN REST_CONTENT -----------------\n";
std::cout << rest_content << "\n";
std::cout << "------------------ END REST_CONENT --------------------\n";
}
解析以下示例输入:
// some random sample "code"
struct base {
std::vector<int> ints;
};
/* class skipped_comment : base { };
*/
namespace q { namespace nested { } } // nested is not supported
class forward_declared;
template <typename T> // actually basically ignored
class
Derived
: base {
std::string more_data_members;
};
enum class MyEnum : int32_t {
foo = 0,
bar, /* whoop } */
qux = foo + bar
};
int main() {
return 0;
}
打印
class: forward_declared
class: Derived
enumclass: MyEnum
namespace: q
struct: base
------------------ BEGIN REST_CONTENT -----------------
;}template <typename T>;;
int main() {
return 0;
}
------------------ END REST_CONENT --------------------
结论
这个结果似乎是指向
的一个不错的指针- 解释如何解决特定障碍
- 演示这种解析方法如何在最轻微的障碍下崩溃(例如
namespace a { namespace b { } }
)
买者自负