如何使用 boost::spirit::qi 和 std::vector<token_type> 而不是 std::string
How to use boost::spirit::qi with a std::vector<token_type> instead of std::string
在应用程序中,我基本上希望有一个 "pre-parsing" 阶段,在 Qi 解析器可以看到它之前调整令牌流。
实现此目的的一种方法是使用某种 "lexer adaptor",它由 lexer
构建并且本身就是 lexer
,它包装并修改内部 lexer
。但是,如果我只是先用内部 lexer
对整个输入流进行 lex,并将结果存储在 std::vector<token_type>
中,然后根据需要进行修改,然后将结果传递给解析器。 (在我的应用程序中,我认为这甚至不会有任何性能问题。)
在几年前的一次电子邮件交流中,有人准确地描述了这个问题,Hartmut 说这应该是微不足道的。 http://comments.gmane.org/gmane.comp.parsers.spirit.general/24899
但是我没有找到任何代码示例或说明如何执行此操作,请查看 spirit::lex
中的 header 并找出答案。亲爱的 reader,除非你能提供帮助,否则这可能会占据我一段时间。
具体问题是,我怎样才能制作一个 "shim" 词法分析器,它包裹了一对 std::vector<token_type>::iterator
并且看起来像 spirit::qi
就像标准 spirit::lex
lexer
.
编辑:需要说明的是,这不是这个问题的重复:Using Boost.Spirit.Qi with custom lexer
我的token_type
是属性,Hartmut说我需要做的额外事情的细节就是这个问题的实质。
编辑:好的,我做了一个 SSCCE。这个版本 没有 有属性词法分析器标记,但即使没有它我仍然无法让它工作,无论如何这似乎是一个很好的 SSCCE 入门。
亮点:
"Token buffer" 类型:
template<typename TokenType>
struct token_buffer {
std::vector<TokenType> tokens_;
token_buffer() = default;
bool operator()(token_type t) {
tokens_.push_back(t);
return true;
}
void print(std::ostream & o) const { ... }
};
我的 第一次 尝试制作一个 "buffer lexer" ,它看起来像 Qi 的 lex::lexer,但实际上从缓冲区提供令牌。这个来源于上面的lex_basic,不知道对不对
template<typename LexerType>
class buffer_lexer : public lex_basic<LexerType> {
public:
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator iterator_type;
private:
const buff_type & buff_;
public:
buffer_lexer(const buff_type & b) : lex_basic<LexerType>(), buff_(b) {}
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is needed
template<typename T>
iterator_type begin(T, T) { return begin(); }
};
我的 第二次 尝试制作缓冲区词法分析器。这个 不是 派生自 lex_basic
而是试图遵循 header boost/spirit/home/lex/lexer/lexertl/lexer.hpp
:
中的这些说明
///////////////////////////////////////////////////////////////////////////
//
// Every lexer type to be used as a lexer for Spirit has to conform to
// the following public interface:
//
// typedefs:
// iterator_type The type of the iterator exposed by this lexer.
// token_type The type of the tokens returned from the exposed
// iterators.
//
// functions:
// default constructor
// Since lexers are instantiated as base classes
// only it might be a good idea to make this
// constructor protected.
// begin, end Return a pair of iterators, when dereferenced
// returning the sequence of tokens recognized in
// the input stream given as the parameters to the
// begin() function.
// add_token Should add the definition of a token to be
// recognized by this lexer.
// clear Should delete all current token definitions
// associated with the given state of this lexer
// object.
//
// template parameters:
// Iterator The type of the iterator used to access the
// underlying character stream.
// Token The type of the tokens to be returned from the
// exposed token iterator.
// Functor The type of the InputPolicy to use to instantiate
// the multi_pass iterator type to be used as the
// token iterator (returned from begin()/end()).
//
///////////////////////////////////////////////////////////////////////////
这是我想出的"buffer_lexer_raw":
template<typename Iterator,
typename TokenType,
typename Functor = lex::lexertl::functor<TokenType, lex::lexertl::detail::data, Iterator>>
class buffer_lexer_raw {
typedef TokenType token_type;
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator iterator_type;
typedef typename boost::detail::iterator_traits<typename token_type::iterator_type>::value_type char_type;
private:
buff_type buff_;
public:
buffer_lexer_raw() {}
void set_buffer(const buff_type & b) { buff_ = b; }
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is needed
template<typename T>
iterator_type begin(T, T) { return begin(); }
std::size_t add_token(char_type const* state, char_type tokendef,
std::size_t token_id, char_type const* targetstate)
{
return 1;
}
void clear(char_type const* state) {}
};
测试代码响应文件顶部定义的宏。
// Use the type "buffer_lexer" which derives from lex_basic<Lexer>
//#define WHICH_LEXER_TYPE 1
// Use the type "buffer_lexer_raw" which does not derive from anything
//#define WHICH_LEXER_TYPE 2
// Use the "placebo" lexer, which is just lex_basic<Lexer>, as a sanity test of our lex:: api calls
#define WHICH_LEXER_TYPE 0
测试代码将:
- 运行 一个简单测试用例的词法分析器,并详细转储词法分析的令牌序列。
- 运行 使用
lex::tokenize_and_parse
在几个简单的测试用例上串联词法分析器和语法,并转储生成的 AST。
- 再次尝试词法分析和解析,使用宏选择的词法分析器生成用于
qi::parse
的迭代器。它将检查生成的 AST 是否与以 "easy" 方式生成的 AST 相同。
目前,#define WHICH_LEXER_TYPE 0
选项在 gcc-4.8 和 clang-3.6 上编译并且对我来说效果很好。
我实际上无法使用 #define WHICH_LEXER_TYPE 1
或 #define WHICH_LEXER_TYPE 2
选项对其进行编译。对于类型 1,clang 给出以下错误消息,我对此一无所知:
In file included from main.cpp:1:
In file included from /usr/include/boost/spirit/include/lex_lexertl.hpp:16:
In file included from /usr/include/boost/spirit/home/lex/lexer_lexertl.hpp:15:
In file included from /usr/include/boost/spirit/home/lex.hpp:13:
In file included from /usr/include/boost/spirit/home/lex/lexer.hpp:14:
In file included from /usr/include/boost/spirit/home/lex/lexer/token_def.hpp:21:
In file included from /usr/include/boost/spirit/home/lex/reference.hpp:16:
/usr/include/boost/spirit/home/qi/reference.hpp:43:30: error: no matching member function for call to 'parse'
return ref.get().parse(first, last, context, skipper, attr);
~~~~~~~~~~^~~~~
/usr/include/boost/spirit/home/qi/parse.hpp:86:42: note: in instantiation of function template specialization 'boost::spirit::qi::reference<const
boost::spirit::qi::rule<boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const
char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>, lexertl::detail::data,
__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, mpl_::bool_<false>, mpl_::bool_<true> > >, ast::Body (),
boost::spirit::locals<std::basic_string<char>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
boost::spirit::unused_type, boost::spirit::unused_type> >::parse<__gnu_cxx::__normal_iterator<const
boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
mpl_::bool_<true>, unsigned long> *, std::vector<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >,
boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>,
std::allocator<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long> > > >, boost::spirit::context<boost::fusion::cons<ast::Body &, boost::fusion::nil>,
boost::spirit::locals<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na> >, boost::spirit::unused_type,
ast::Body>' requested here
return compile<qi::domain>(expr).parse(first, last, context, unused, attr);
^
main.cpp:414:12: note: in instantiation of function template specialization 'boost::spirit::qi::parse<__gnu_cxx::__normal_iterator<const
boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
mpl_::bool_<true>, unsigned long> *, std::vector<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >,
boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>,
std::allocator<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long> > > >,
basic_grammar<boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const
char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>, lexertl::detail::data,
__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, mpl_::bool_<false>, mpl_::bool_<true> > > >, ast::Body>' requested here
if (!qi::parse(it, fin, bgram, tree2)) {
^
/usr/include/boost/spirit/home/qi/nonterminal/rule.hpp:273:14: note: candidate function [with Context = boost::spirit::context<boost::fusion::cons<ast::Body &,
boost::fusion::nil>, boost::spirit::locals<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na> >, Skipper =
boost::spirit::unused_type, Attribute = ast::Body] not viable: no known conversion from '__gnu_cxx::__normal_iterator<const
boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
mpl_::bool_<true>, unsigned long> *, std::vector<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >,
boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>,
std::allocator<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long> > > >' to
'boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *,
std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>, lexertl::detail::data,
__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, mpl_::bool_<false>, mpl_::bool_<true> > > &' for 1st argument
bool parse(Iterator& first, Iterator const& last
^
/usr/include/boost/spirit/home/qi/nonterminal/rule.hpp:319:14: note: candidate function template not viable: requires 6 arguments, but 5 were provided
bool parse(Iterator& first, Iterator const& last
^
1 error generated.
“2”选项给出了本质上相同的错误消息。 gcc 似乎没有给出更好的错误信息。
完整的源代码如下:
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/std_pair.hpp>
#include <boost/variant/get.hpp>
#include <boost/variant/variant.hpp>
#include <boost/variant/recursive_variant.hpp>
#include <boost/preprocessor/stringize.hpp>
#include <vector>
#include <string>
typedef unsigned int uint;
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace mpl = boost::mpl;
// Use the type "buffer_lexer" which derives from lex_basic<Lexer>
//#define WHICH_LEXER_TYPE 1
// Use the type "buffer_lexer_raw" which does not derive from anything
//#define WHICH_LEXER_TYPE 2
// Use the "placebo" lexer, which is just lex_basic<Lexer>, as a sanity test of
// our lex:: api calls
#define WHICH_LEXER_TYPE 0
//// Lexer definition
enum tokenids {
LCARET = lex::min_token_id + 10,
RCARET,
BSLASH,
LBRACE,
RBRACE,
LPAREN,
RPAREN,
EQUALS,
USCORE,
ALPHA,
NUM,
EOL,
BLANK,
IDANY
};
#define TOKEN_CASE(X) \
case X: return #X
const char *token_id_string(size_t id) {
switch (id) {
TOKEN_CASE(LCARET);
TOKEN_CASE(RCARET);
TOKEN_CASE(BSLASH);
TOKEN_CASE(LBRACE);
TOKEN_CASE(RBRACE);
TOKEN_CASE(LPAREN);
TOKEN_CASE(RPAREN);
TOKEN_CASE(EQUALS);
TOKEN_CASE(USCORE);
TOKEN_CASE(ALPHA);
TOKEN_CASE(NUM);
TOKEN_CASE(EOL);
TOKEN_CASE(BLANK);
TOKEN_CASE(IDANY);
default:
return "Unknown token";
}
}
template <typename Lexer> struct lex_basic : lex::lexer<Lexer> {
lex_basic() {
this->self.add
('<', LCARET)
('>', RCARET)
('/', BSLASH)
('{', LBRACE)
('}', RBRACE)
('(', LPAREN)
(')', RPAREN)
('=', EQUALS)
('_', USCORE)
("[A-Za-z]", ALPHA)
("[0-9]", NUM)
('\n', EOL)
("[ \t\r]", BLANK)
(".", IDANY);
}
};
typedef std::string::const_iterator str_it;
// the token type needs to know the iterator type of the underlying
// input and the set of used token value types
typedef lex::lexertl::token<str_it, mpl::vector<char>> token_type;
template <typename TokenType> struct token_buffer {
std::vector<TokenType> tokens_;
token_buffer() = default;
bool operator()(token_type t) {
tokens_.push_back(t);
return true;
}
void print(std::ostream &o) const {
std::cout << "tokens_.size() == " << tokens_.size() << std::endl;
for (size_t i = 0; i < tokens_.size(); ++i) {
const TokenType &t = tokens_[i];
o << "[" << i << "]: -" << token_id_string(t.id()) << "- \"" << t
<< "\" [";
const auto &v = t.value();
if (t.id() == EOL) {
o << "\n";
} else {
o << v;
}
o << "]" << std::endl;
}
}
};
/***
* Lexers which serve tokens from a buffer
*/
// Two versions of the same thing, one deriving from lex::lexer, one not
template <typename LexerType> class buffer_lexer : public lex_basic<LexerType> {
public:
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator iterator_type;
private:
const buff_type &buff_;
public:
buffer_lexer(const buff_type &b) : lex_basic<LexerType>(), buff_(b) {}
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is
// needed
template <typename T> iterator_type begin(T, T) { return begin(); }
};
template <typename Iterator, typename TokenType,
typename Functor = lex::lexertl::functor<
TokenType, lex::lexertl::detail::data, Iterator>>
class buffer_lexer_raw {
typedef TokenType token_type;
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator iterator_type;
typedef typename boost::detail::iterator_traits<
typename token_type::iterator_type>::value_type char_type;
private:
buff_type buff_;
public:
buffer_lexer_raw() {}
void set_buffer(const buff_type &b) { buff_ = b; }
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is
// needed
template <typename T> iterator_type begin(T, T) { return begin(); }
std::size_t add_token(char_type const *state, char_type tokendef,
std::size_t token_id, char_type const *targetstate) {
return 1;
}
void clear(char_type const *state) {}
};
/***
* AST
*/
namespace ast {
typedef std::string Str;
struct BraceExpr;
typedef boost::variant<Str, boost::recursive_wrapper<BraceExpr>> BraceExprArg;
struct BraceExpr {
std::vector<BraceExprArg> args;
};
typedef std::pair<Str, Str> Pair;
struct Body;
typedef boost::variant<Pair, BraceExpr, boost::recursive_wrapper<Body>> Node;
struct Body {
Str key;
std::vector<Node> nodes;
};
} // end namespace ast
BOOST_FUSION_ADAPT_STRUCT(ast::BraceExpr,
(std::vector<ast::BraceExprArg>, args))
BOOST_FUSION_ADAPT_STRUCT(ast::Body,
(ast::Str, key)(std::vector<ast::Node>, nodes))
namespace ast {
// Stream ops
class printer : public boost::static_visitor<> {
std::ostream &ss_;
uint indent_;
std::string indent(uint extra = 0) const {
return std::string(indent_ + extra, ' ');
}
std::string indent_plus_tab() const { return indent(tab_width); }
public:
static constexpr uint tab_width = 4;
explicit printer(std::ostream &s, uint indent = 0)
: ss_(s), indent_(indent) {}
void operator()(const Str &s) const { ss_ << s; }
void operator()(const BraceExpr &b) const {
ss_ << "{";
for (size_t i = 0; i < b.args.size(); ++i) {
if (i) {
ss_ << " ";
}
boost::apply_visitor(*this, b.args[i]);
}
ss_ << "}";
}
void operator()(const Pair &p) const { ss_ << p.first << " = " << p.second; }
void operator()(const Body &b) const {
ss_ << indent() << "<" << b.key << ">\n";
printer p{ss_, indent_ + tab_width};
for (const auto &n : b.nodes) {
ss_ << indent_plus_tab();
boost::apply_visitor(p, n);
ss_ << "\n";
}
ss_ << indent() << "</" << b.key << ">";
}
};
std::ostream &operator<<(std::ostream &ss, const BraceExpr &b) {
printer p{ss};
p(b);
return ss;
}
std::ostream &operator<<(std::ostream &ss, const Pair &p) {
printer pr{ss};
pr(p);
return ss;
}
std::ostream &operator<<(std::ostream &ss, const Body &b) {
printer p{ss};
p(b);
return ss;
}
// Equality ops
bool operator==(const Pair &p1, const Pair &p2) {
return p1.first == p2.first && p1.second == p2.second;
}
bool operator==(const BraceExpr &b1, const BraceExpr &b2) {
return b1.args == b2.args;
}
bool operator==(const Body &b1, const Body &b2) {
return b1.key == b2.key && b1.nodes == b2.nodes;
}
bool operator!=(const Pair &p1, const Pair &p2) { return !(p1 == p2); }
bool operator!=(const BraceExpr &b1, const BraceExpr &b2) {
return !(b1 == b2);
}
bool operator!=(const Body &b1, const Body &b2) { return !(b1 == b2); }
} // end namespace ast
/***
* Grammar
*/
template <typename Iterator>
struct basic_grammar
: qi::grammar<Iterator, ast::Body(), qi::locals<ast::Str>> {
qi::rule<Iterator, ast::Body(), qi::locals<ast::Str>> body;
qi::rule<Iterator, ast::Node()> node;
qi::rule<Iterator, ast::Pair()> pair;
qi::rule<Iterator, ast::BraceExprArg()> brace_expr_arg;
qi::rule<Iterator, ast::BraceExpr()> brace_expr;
qi::rule<Iterator, ast::Str()> identifier;
qi::rule<Iterator, ast::Str()> str;
qi::rule<Iterator, ast::Str()> open_tag;
qi::rule<Iterator /*, ast::Str()*/> close_tag;
qi::rule<Iterator> lbrace;
qi::rule<Iterator> rbrace;
qi::rule<Iterator> equals;
qi::rule<Iterator> ws;
template <typename TokenDef>
basic_grammar(const TokenDef &tok)
: basic_grammar::base_type(body, "body") {
using namespace qi;
ws %= token(BLANK) | token(EOL);
lbrace %= token(LBRACE);
rbrace %= token(RBRACE);
equals %= token(EQUALS);
identifier %= token(ALPHA) >> *(token(ALPHA) | token(NUM) | token(USCORE));
str %= *(token(LCARET) | token(RCARET) | token(BSLASH) | token(LPAREN) |
token(RPAREN) | token(ALPHA) | token(NUM) | token(USCORE) |
token(EQUALS) | token(BLANK) | token(IDANY));
open_tag %= omit[token(LCARET)] >> identifier >>
omit[token(RCARET)]; // tok.open_tag;
close_tag %= omit[token(LCARET) >> token(BSLASH)] >> identifier >>
omit[token(RCARET)]; // tok.close_tag;
pair = skip(boost::proto::deep_copy(ws))[identifier >> equals >> str];
body = skip(boost::proto::deep_copy(ws))[open_tag >> *node >> close_tag];
node = brace_expr | body | pair;
brace_expr_arg = brace_expr | identifier;
brace_expr =
skip(boost::proto::deep_copy(ws))[lbrace >> *brace_expr_arg >> rbrace];
}
};
/***
* Usage / Tests
*/
// use actor_lexer<> here if your token definitions have semantic
// actions
typedef lex::lexertl::lexer<token_type> lexer_type;
// this is the iterator exposed by the lexer, we use this for parsing
typedef lexer_type::iterator_type iterator_type;
token_buffer<token_type> test_lexer(const std::string &input,
bool silent = false) {
str_it s = input.begin();
str_it end = input.end();
// create a lexer instance
lex_basic<lexer_type> lex;
token_buffer<token_type> buff;
if (!lex::tokenize(s, end, lex, [&](token_type t) { return buff(t); })) {
if (!silent) {
std::cout << "\nTokenizing failed!" << std::endl;
}
} else {
if (!silent) {
std::cout << "\nTokenizing succeeded!" << std::endl;
}
}
if (!silent) {
buff.print(std::cout);
}
return buff;
}
void test_grammar(const std::string &input) {
lex_basic<lexer_type> lex;
basic_grammar<iterator_type> gram{lex};
ast::Body tree;
{
str_it s = input.begin();
str_it end = input.end();
if (!lex::tokenize_and_parse(s, end, lex, gram, tree)) {
std::cout << "\nParsing failed!" << std::endl;
} else {
std::cout << "\nParsing succeeded!" << std::endl;
}
std::cout << tree << std::endl;
}
// Now try to do it in two steps, with buffered lexer
auto buff = test_lexer(input, true); // get buffer, silence output
#if WHICH_LEXER_TYPE == 1
buffer_lexer<lexer_type> blex{buff.tokens_};
#else
#if WHICH_LEXER_TYPE == 2
buffer_lexer_raw<str_it, token_type> blex;
blex.set_buffer(buff.tokens_);
#else
lex_basic<lexer_type> blex;
#endif
#endif
basic_grammar<iterator_type> bgram{blex};
ast::Body tree2;
{
#if (WHICH_LEXER_TYPE == 1) || (WHICH_LEXER_TYPE == 2)
auto it = blex.begin();
#else
str_it s = input.begin();
str_it end = input.end();
auto it = blex.begin(s, end);
#endif
auto fin = blex.end();
if (!qi::parse(it, fin, bgram, tree2)) {
std::cout << "\nBuffered parsing failed!" << std::endl;
} else {
std::cout << "\nBuffered parsing succeeded!" << std::endl;
}
}
std::cout << tree2 << std::endl;
if (tree != tree2) {
std::cout << "\nRegular parsing vs. buffered parsing mismatch!"
<< std::endl;
}
}
int main() {
std::string input{""
"<asdf>\n"
"foo = bar\n"
"{F foo}\n"
"{G {F foo} {H bar}}\n"
"</asdf>\n"};
test_lexer(input);
// Use lexer and grammar at once as demonstrated in tutorials
std::string input2 = "<asdf></asdf>";
test_grammar(input2);
test_grammar(input);
std::string input3{""
"<asdf>\n"
"foo = bar\n"
"{F foo}\n"
"{G {F foo} {H bar}}\n"
"<jkl>\n"
"baz = gaz\n"
"{H {H H} {{{H} {G} {F foo}} {B ar}} {Q i}}\n"
"</jkl>\n"
"</asdf>\n"};
test_grammar(input3);
return 0;
}
我也认为是多通道问题的罪魁祸首,但经过多次摆弄后,我能够通过 2 个简单的修复来让它工作 ¹
template <typename Iterator, typename TokenType,
typename Functor = lex::lexertl::functor<
TokenType, lex::lexertl::detail::data, Iterator>>
class buffer_lexer_raw {
typedef TokenType token_type;
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator base_iterator_type;
public:
struct iterator_type : base_iterator_type {
typedef base_iterator_type base_iterator_type;
using base_iterator_type::base_iterator_type;
};
typedef char char_type;
这确保嵌套的 iterator_type
本身具有 base_iterator_type
类型。这似乎是库内部某处所必需的(可能是由于对令牌迭代器的假设)。
第二部分是实际实例化语法的地方,不要使用"plain"迭代器,而是我们刚刚定义的:
basic_grammar<concrete_lexer_type::iterator_type> bgram{blex};
完全可用的清单:
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/std_pair.hpp>
#include <boost/variant/get.hpp>
#include <boost/variant/variant.hpp>
#include <boost/variant/recursive_variant.hpp>
#include <boost/preprocessor/stringize.hpp>
#include <vector>
#include <string>
typedef unsigned int uint;
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace mpl = boost::mpl;
//// Lexer definition
enum tokenids {
LCARET = lex::min_token_id + 10,
RCARET,
BSLASH,
LBRACE,
RBRACE,
LPAREN,
RPAREN,
EQUALS,
USCORE,
ALPHA,
NUM,
EOL,
BLANK,
IDANY
};
#define TOKEN_CASE(X) \
case X: return #X
const char *token_id_string(size_t id) {
switch (id) {
TOKEN_CASE(LCARET);
TOKEN_CASE(RCARET);
TOKEN_CASE(BSLASH);
TOKEN_CASE(LBRACE);
TOKEN_CASE(RBRACE);
TOKEN_CASE(LPAREN);
TOKEN_CASE(RPAREN);
TOKEN_CASE(EQUALS);
TOKEN_CASE(USCORE);
TOKEN_CASE(ALPHA);
TOKEN_CASE(NUM);
TOKEN_CASE(EOL);
TOKEN_CASE(BLANK);
TOKEN_CASE(IDANY);
default:
return "Unknown token";
}
}
template <typename Lexer> struct lex_basic : lex::lexer<Lexer> {
lex_basic() {
this->self.add
('<', LCARET)
('>', RCARET)
('/', BSLASH)
('{', LBRACE)
('}', RBRACE)
('(', LPAREN)
(')', RPAREN)
('=', EQUALS)
('_', USCORE)
("[A-Za-z]", ALPHA)
("[0-9]", NUM)
('\n', EOL)
("[ \t\r]", BLANK)
(".", IDANY);
}
};
typedef std::string::const_iterator str_it;
// the token type needs to know the iterator type of the underlying
// input and the set of used token value types
typedef lex::lexertl::token<str_it, mpl::vector<char>> token_type;
template <typename TokenType> struct token_buffer {
std::vector<TokenType> tokens_;
token_buffer() = default;
bool operator()(token_type t) {
tokens_.push_back(t);
return true;
}
void print(std::ostream &o) const {
std::cout << "tokens_.size() == " << tokens_.size() << std::endl;
for (size_t i = 0; i < tokens_.size(); ++i) {
const TokenType &t = tokens_[i];
o << "[" << i << "]: -" << token_id_string(t.id()) << "- \"" << t
<< "\" [";
const auto &v = t.value();
if (t.id() == EOL) {
o << "\n";
} else {
o << v;
}
o << "]" << std::endl;
}
}
};
/***
* Lexers which serve tokens from a buffer
*/
// Two versions of the same thing, one deriving from lex::lexer, one not
template <typename LexerType> class buffer_lexer : public lex_basic<LexerType> {
public:
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator iterator_type;
private:
const buff_type &buff_;
public:
buffer_lexer(const buff_type &b) : lex_basic<LexerType>(), buff_(b) {}
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is
// needed
template <typename T> iterator_type begin(T, T) { return begin(); }
};
template <typename Iterator, typename TokenType,
typename Functor = lex::lexertl::functor<
TokenType, lex::lexertl::detail::data, Iterator>>
class buffer_lexer_raw {
typedef TokenType token_type;
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator vec_iterator_type;
public:
struct iterator_type : vec_iterator_type {
typedef vec_iterator_type base_iterator_type;
using vec_iterator_type::vec_iterator_type;
};
typedef char char_type;
private:
buff_type buff_;
public:
buffer_lexer_raw() {}
void set_buffer(const buff_type &b) { buff_ = b; }
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is
// needed
template <typename T> iterator_type begin(T, T) { return begin(); }
std::size_t add_token(char_type const*, char_type, std::size_t, char_type const*) {
return 1;
}
void clear(char_type const *) {}
};
/***
* AST
*/
namespace ast {
typedef std::string Str;
struct BraceExpr;
typedef boost::variant<Str, boost::recursive_wrapper<BraceExpr>> BraceExprArg;
struct BraceExpr {
std::vector<BraceExprArg> args;
};
typedef std::pair<Str, Str> Pair;
struct Body;
typedef boost::variant<Pair, BraceExpr, boost::recursive_wrapper<Body>> Node;
struct Body {
Str key;
std::vector<Node> nodes;
};
} // end namespace ast
BOOST_FUSION_ADAPT_STRUCT(ast::BraceExpr,
(std::vector<ast::BraceExprArg>, args))
BOOST_FUSION_ADAPT_STRUCT(ast::Body,
(ast::Str, key)(std::vector<ast::Node>, nodes))
namespace ast {
// Stream ops
class printer : public boost::static_visitor<> {
std::ostream &ss_;
uint indent_;
std::string indent(uint extra = 0) const { return std::string(indent_ + extra, ' '); }
std::string indent_plus_tab() const { return indent(tab_width); }
public:
static constexpr uint tab_width = 4;
explicit printer(std::ostream &s, uint indent = 0)
: ss_(s), indent_(indent) {}
void operator()(const Str &s) const { ss_ << s; }
void operator()(const BraceExpr &b) const {
ss_ << "{";
for (size_t i = 0; i < b.args.size(); ++i) {
if (i) {
ss_ << " ";
}
boost::apply_visitor(*this, b.args[i]);
}
ss_ << "}";
}
void operator()(const Pair &p) const { ss_ << p.first << " = " << p.second; }
void operator()(const Body &b) const {
ss_ << indent() << "<" << b.key << ">\n";
printer p{ss_, indent_ + tab_width};
for (const auto &n : b.nodes) {
ss_ << indent_plus_tab();
boost::apply_visitor(p, n);
ss_ << "\n";
}
ss_ << indent() << "</" << b.key << ">";
}
};
std::ostream &operator<<(std::ostream &ss, const BraceExpr &b) {
printer p{ss};
p(b);
return ss;
}
std::ostream &operator<<(std::ostream &ss, const Pair &p) {
printer pr{ss};
pr(p);
return ss;
}
std::ostream &operator<<(std::ostream &ss, const Body &b) {
printer p{ss};
p(b);
return ss;
}
// Equality ops
bool operator==(const Pair &p1, const Pair &p2) {
return p1.first == p2.first && p1.second == p2.second;
}
bool operator==(const BraceExpr &b1, const BraceExpr &b2) {
return b1.args == b2.args;
}
bool operator==(const Body &b1, const Body &b2) {
return b1.key == b2.key && b1.nodes == b2.nodes;
}
bool operator!=(const Pair &p1, const Pair &p2) { return !(p1 == p2); }
bool operator!=(const BraceExpr &b1, const BraceExpr &b2) {
return !(b1 == b2);
}
bool operator!=(const Body &b1, const Body &b2) { return !(b1 == b2); }
} // end namespace ast
/***
* Grammar
*/
template <typename Iterator>
struct basic_grammar : qi::grammar<Iterator, ast::Body(), qi::locals<ast::Str>> {
qi::rule<Iterator, ast::Body(), qi::locals<ast::Str>> body;
qi::rule<Iterator, ast::Node()> node;
qi::rule<Iterator, ast::Pair()> pair;
qi::rule<Iterator, ast::BraceExprArg()> brace_expr_arg;
qi::rule<Iterator, ast::BraceExpr()> brace_expr;
qi::rule<Iterator, ast::Str()> identifier;
qi::rule<Iterator, ast::Str()> str;
qi::rule<Iterator, ast::Str()> open_tag;
qi::rule<Iterator /*, ast::Str()*/> close_tag;
qi::rule<Iterator> lbrace;
qi::rule<Iterator> rbrace;
qi::rule<Iterator> equals;
qi::rule<Iterator> ws;
template <typename TokenDef>
basic_grammar(const TokenDef &tok) : basic_grammar::base_type(body, "body") {
using namespace qi;
ws %= token(BLANK) | token(EOL);
lbrace %= token(LBRACE);
rbrace %= token(RBRACE);
equals %= token(EQUALS);
identifier %= token(ALPHA) >> *(token(ALPHA) | token(NUM) | token(USCORE));
str %= *(token(LCARET) | token(RCARET) | token(BSLASH) | token(LPAREN) |
token(RPAREN) | token(ALPHA) | token(NUM) | token(USCORE) |
token(EQUALS) | token(BLANK) | token(IDANY));
open_tag %= omit[token(LCARET)] >> identifier >> omit[token(RCARET)]; // tok.open_tag;
close_tag %= omit[token(LCARET) >> token(BSLASH)] >> identifier >> omit[token(RCARET)]; // tok.close_tag;
// TODO FIXME the deep_copy shoudl be not required there
/// bla_12 = somevalue
pair = skip(boost::proto::deep_copy(ws)) [ identifier >> equals >> str ] ;
/// <bla><sub>{some}{braced{expres}}sions</sub><pair1>key1=value</pair1></bla>
body = skip(boost::proto::deep_copy(ws)) [ open_tag >> *node >> close_tag ] ;
///
node = brace_expr | body | pair;
brace_expr_arg = brace_expr | identifier;
/// {{{bla}some{other}nested{id{entifier}s}}and such}
brace_expr = skip(boost::proto::deep_copy(ws))[lbrace >> *brace_expr_arg >> rbrace];
}
};
/***
* Usage / Tests
*/
// use actor_lexer<> here if your token definitions have semantic
// actions
typedef lex::lexertl::lexer<token_type> lexer_type;
// this is the iterator exposed by the lexer, we use this for parsing
typedef lexer_type::iterator_type iterator_type;
token_buffer<token_type> test_lexer(const std::string &input,
bool silent = false) {
str_it s = input.begin();
str_it end = input.end();
// create a lexer instance
lex_basic<lexer_type> lex;
token_buffer<token_type> buff;
if (!lex::tokenize(s, end, lex, [&](token_type t) { return buff(t); })) {
if (!silent) {
std::cout << "\nTokenizing failed!" << std::endl;
}
} else {
if (!silent) {
std::cout << "\nTokenizing succeeded!" << std::endl;
}
}
if (!silent) {
buff.print(std::cout);
}
return buff;
}
void test_grammar(const std::string &input) {
lex_basic<lexer_type> lex;
basic_grammar<iterator_type> gram{lex};
ast::Body tree;
{
str_it s = input.begin();
str_it end = input.end();
if (!lex::tokenize_and_parse(s, end, lex, gram, tree)) {
std::cout << "\nParsing failed!" << std::endl;
} else {
std::cout << "\nParsing succeeded!" << std::endl;
}
std::cout << tree << std::endl;
}
// Now try to do it in two steps, with buffered lexer
auto buff = test_lexer(input, true); // get buffer, silence output
typedef buffer_lexer_raw<str_it, token_type> concrete_lexer_type;
buffer_lexer_raw<str_it, token_type> blex;
blex.set_buffer(buff.tokens_);
basic_grammar<concrete_lexer_type::iterator_type> bgram{blex};
ast::Body tree2;
{
auto it = blex.begin();
auto fin = blex.end();
if (!qi::parse(it, fin, bgram, tree2)) {
std::cout << "\nBuffered parsing failed!" << std::endl;
} else {
std::cout << "\nBuffered parsing succeeded!" << std::endl;
}
}
std::cout << tree2 << std::endl;
if (tree != tree2) {
std::cout << "\nRegular parsing vs. buffered parsing mismatch!"
<< std::endl;
}
}
int main() {
std::string const input{""
"<asdf>\n"
"foo = bar\n"
"{F foo}\n"
"{G {F foo} {H bar}}\n"
"</asdf>\n"};
test_lexer(input);
// Use lexer and grammar at once as demonstrated in tutorials
std::string const input2 = "<asdf></asdf>";
test_grammar(input2);
test_grammar(input);
std::string const input3{""
"<asdf>\n"
"foo = bar\n"
"{F foo}\n"
"{G {F foo} {H bar}}\n"
"<jkl>\n"
"baz = gaz\n"
"{H {H H} {{{H} {G} {F foo}} {B ar}} {Q i}}\n"
"</jkl>\n"
"</asdf>\n"};
test_grammar(input3);
}
正在打印:
Tokenizing succeeded!
tokens_.size() == 53
[0]: -LCARET- "65546" [<]
[1]: -ALPHA- "65555" [a]
[2]: -ALPHA- "65555" [s]
[3]: -ALPHA- "65555" [d]
[4]: -ALPHA- "65555" [f]
[5]: -RCARET- "65547" [>]
[6]: -EOL- "65557" [\n]
[7]: -ALPHA- "65555" [f]
[8]: -ALPHA- "65555" [o]
[9]: -ALPHA- "65555" [o]
[10]: -BLANK- "65558" [ ]
[11]: -EQUALS- "65553" [=]
[12]: -BLANK- "65558" [ ]
[13]: -ALPHA- "65555" [b]
[14]: -ALPHA- "65555" [a]
[15]: -ALPHA- "65555" [r]
[16]: -EOL- "65557" [\n]
[17]: -LBRACE- "65549" [{]
[18]: -ALPHA- "65555" [F]
[19]: -BLANK- "65558" [ ]
[20]: -ALPHA- "65555" [f]
[21]: -ALPHA- "65555" [o]
[22]: -ALPHA- "65555" [o]
[23]: -RBRACE- "65550" [}]
[24]: -EOL- "65557" [\n]
[25]: -LBRACE- "65549" [{]
[26]: -ALPHA- "65555" [G]
[27]: -BLANK- "65558" [ ]
[28]: -LBRACE- "65549" [{]
[29]: -ALPHA- "65555" [F]
[30]: -BLANK- "65558" [ ]
[31]: -ALPHA- "65555" [f]
[32]: -ALPHA- "65555" [o]
[33]: -ALPHA- "65555" [o]
[34]: -RBRACE- "65550" [}]
[35]: -BLANK- "65558" [ ]
[36]: -LBRACE- "65549" [{]
[37]: -ALPHA- "65555" [H]
[38]: -BLANK- "65558" [ ]
[39]: -ALPHA- "65555" [b]
[40]: -ALPHA- "65555" [a]
[41]: -ALPHA- "65555" [r]
[42]: -RBRACE- "65550" [}]
[43]: -RBRACE- "65550" [}]
[44]: -EOL- "65557" [\n]
[45]: -LCARET- "65546" [<]
[46]: -BSLASH- "65548" [/]
[47]: -ALPHA- "65555" [a]
[48]: -ALPHA- "65555" [s]
[49]: -ALPHA- "65555" [d]
[50]: -ALPHA- "65555" [f]
[51]: -RCARET- "65547" [>]
[52]: -EOL- "65557" [\n]
Parsing succeeded!
<asdf>
</asdf>
Buffered parsing succeeded!
<asdf>
</asdf>
Parsing succeeded!
<asdf>
foo = bar
{F foo}
{G {F foo} {H bar}}
</asdf>
Buffered parsing succeeded!
<asdf>
foo = bar
{F foo}
{G {F foo} {H bar}}
</asdf>
Parsing succeeded!
<asdf>
foo = bar
{F foo}
{G {F foo} {H bar}}
<jkl>
baz = gaz
{H {H H} {{{H} {G} {F foo}} {B ar}} {Q i}}
</jkl>
</asdf>
Buffered parsing succeeded!
<asdf>
foo = bar
{F foo}
{G {F foo} {H bar}}
<jkl>
baz = gaz
{H {H H} {{{H} {G} {F foo}} {B ar}} {Q i}}
</jkl>
</asdf>
¹ 基于 buffer_lexer_raw
方法
在应用程序中,我基本上希望有一个 "pre-parsing" 阶段,在 Qi 解析器可以看到它之前调整令牌流。
实现此目的的一种方法是使用某种 "lexer adaptor",它由 lexer
构建并且本身就是 lexer
,它包装并修改内部 lexer
。但是,如果我只是先用内部 lexer
对整个输入流进行 lex,并将结果存储在 std::vector<token_type>
中,然后根据需要进行修改,然后将结果传递给解析器。 (在我的应用程序中,我认为这甚至不会有任何性能问题。)
在几年前的一次电子邮件交流中,有人准确地描述了这个问题,Hartmut 说这应该是微不足道的。 http://comments.gmane.org/gmane.comp.parsers.spirit.general/24899
但是我没有找到任何代码示例或说明如何执行此操作,请查看 spirit::lex
中的 header 并找出答案。亲爱的 reader,除非你能提供帮助,否则这可能会占据我一段时间。
具体问题是,我怎样才能制作一个 "shim" 词法分析器,它包裹了一对 std::vector<token_type>::iterator
并且看起来像 spirit::qi
就像标准 spirit::lex
lexer
.
编辑:需要说明的是,这不是这个问题的重复:Using Boost.Spirit.Qi with custom lexer
我的token_type
是属性,Hartmut说我需要做的额外事情的细节就是这个问题的实质。
编辑:好的,我做了一个 SSCCE。这个版本 没有 有属性词法分析器标记,但即使没有它我仍然无法让它工作,无论如何这似乎是一个很好的 SSCCE 入门。
亮点:
"Token buffer" 类型:
template<typename TokenType>
struct token_buffer {
std::vector<TokenType> tokens_;
token_buffer() = default;
bool operator()(token_type t) {
tokens_.push_back(t);
return true;
}
void print(std::ostream & o) const { ... }
};
我的 第一次 尝试制作一个 "buffer lexer" ,它看起来像 Qi 的 lex::lexer,但实际上从缓冲区提供令牌。这个来源于上面的lex_basic,不知道对不对
template<typename LexerType>
class buffer_lexer : public lex_basic<LexerType> {
public:
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator iterator_type;
private:
const buff_type & buff_;
public:
buffer_lexer(const buff_type & b) : lex_basic<LexerType>(), buff_(b) {}
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is needed
template<typename T>
iterator_type begin(T, T) { return begin(); }
};
我的 第二次 尝试制作缓冲区词法分析器。这个 不是 派生自 lex_basic
而是试图遵循 header boost/spirit/home/lex/lexer/lexertl/lexer.hpp
:
///////////////////////////////////////////////////////////////////////////
//
// Every lexer type to be used as a lexer for Spirit has to conform to
// the following public interface:
//
// typedefs:
// iterator_type The type of the iterator exposed by this lexer.
// token_type The type of the tokens returned from the exposed
// iterators.
//
// functions:
// default constructor
// Since lexers are instantiated as base classes
// only it might be a good idea to make this
// constructor protected.
// begin, end Return a pair of iterators, when dereferenced
// returning the sequence of tokens recognized in
// the input stream given as the parameters to the
// begin() function.
// add_token Should add the definition of a token to be
// recognized by this lexer.
// clear Should delete all current token definitions
// associated with the given state of this lexer
// object.
//
// template parameters:
// Iterator The type of the iterator used to access the
// underlying character stream.
// Token The type of the tokens to be returned from the
// exposed token iterator.
// Functor The type of the InputPolicy to use to instantiate
// the multi_pass iterator type to be used as the
// token iterator (returned from begin()/end()).
//
///////////////////////////////////////////////////////////////////////////
这是我想出的"buffer_lexer_raw":
template<typename Iterator,
typename TokenType,
typename Functor = lex::lexertl::functor<TokenType, lex::lexertl::detail::data, Iterator>>
class buffer_lexer_raw {
typedef TokenType token_type;
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator iterator_type;
typedef typename boost::detail::iterator_traits<typename token_type::iterator_type>::value_type char_type;
private:
buff_type buff_;
public:
buffer_lexer_raw() {}
void set_buffer(const buff_type & b) { buff_ = b; }
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is needed
template<typename T>
iterator_type begin(T, T) { return begin(); }
std::size_t add_token(char_type const* state, char_type tokendef,
std::size_t token_id, char_type const* targetstate)
{
return 1;
}
void clear(char_type const* state) {}
};
测试代码响应文件顶部定义的宏。
// Use the type "buffer_lexer" which derives from lex_basic<Lexer>
//#define WHICH_LEXER_TYPE 1
// Use the type "buffer_lexer_raw" which does not derive from anything
//#define WHICH_LEXER_TYPE 2
// Use the "placebo" lexer, which is just lex_basic<Lexer>, as a sanity test of our lex:: api calls
#define WHICH_LEXER_TYPE 0
测试代码将:
- 运行 一个简单测试用例的词法分析器,并详细转储词法分析的令牌序列。
- 运行 使用
lex::tokenize_and_parse
在几个简单的测试用例上串联词法分析器和语法,并转储生成的 AST。 - 再次尝试词法分析和解析,使用宏选择的词法分析器生成用于
qi::parse
的迭代器。它将检查生成的 AST 是否与以 "easy" 方式生成的 AST 相同。
目前,#define WHICH_LEXER_TYPE 0
选项在 gcc-4.8 和 clang-3.6 上编译并且对我来说效果很好。
我实际上无法使用 #define WHICH_LEXER_TYPE 1
或 #define WHICH_LEXER_TYPE 2
选项对其进行编译。对于类型 1,clang 给出以下错误消息,我对此一无所知:
In file included from main.cpp:1:
In file included from /usr/include/boost/spirit/include/lex_lexertl.hpp:16:
In file included from /usr/include/boost/spirit/home/lex/lexer_lexertl.hpp:15:
In file included from /usr/include/boost/spirit/home/lex.hpp:13:
In file included from /usr/include/boost/spirit/home/lex/lexer.hpp:14:
In file included from /usr/include/boost/spirit/home/lex/lexer/token_def.hpp:21:
In file included from /usr/include/boost/spirit/home/lex/reference.hpp:16:
/usr/include/boost/spirit/home/qi/reference.hpp:43:30: error: no matching member function for call to 'parse'
return ref.get().parse(first, last, context, skipper, attr);
~~~~~~~~~~^~~~~
/usr/include/boost/spirit/home/qi/parse.hpp:86:42: note: in instantiation of function template specialization 'boost::spirit::qi::reference<const
boost::spirit::qi::rule<boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const
char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>, lexertl::detail::data,
__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, mpl_::bool_<false>, mpl_::bool_<true> > >, ast::Body (),
boost::spirit::locals<std::basic_string<char>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
boost::spirit::unused_type, boost::spirit::unused_type> >::parse<__gnu_cxx::__normal_iterator<const
boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
mpl_::bool_<true>, unsigned long> *, std::vector<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >,
boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>,
std::allocator<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long> > > >, boost::spirit::context<boost::fusion::cons<ast::Body &, boost::fusion::nil>,
boost::spirit::locals<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na> >, boost::spirit::unused_type,
ast::Body>' requested here
return compile<qi::domain>(expr).parse(first, last, context, unused, attr);
^
main.cpp:414:12: note: in instantiation of function template specialization 'boost::spirit::qi::parse<__gnu_cxx::__normal_iterator<const
boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
mpl_::bool_<true>, unsigned long> *, std::vector<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >,
boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>,
std::allocator<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long> > > >,
basic_grammar<boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const
char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>, lexertl::detail::data,
__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, mpl_::bool_<false>, mpl_::bool_<true> > > >, ast::Body>' requested here
if (!qi::parse(it, fin, bgram, tree2)) {
^
/usr/include/boost/spirit/home/qi/nonterminal/rule.hpp:273:14: note: candidate function [with Context = boost::spirit::context<boost::fusion::cons<ast::Body &,
boost::fusion::nil>, boost::spirit::locals<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na> >, Skipper =
boost::spirit::unused_type, Attribute = ast::Body] not viable: no known conversion from '__gnu_cxx::__normal_iterator<const
boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
mpl_::bool_<true>, unsigned long> *, std::vector<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >,
boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>,
std::allocator<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long> > > >' to
'boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *,
std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>, lexertl::detail::data,
__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, mpl_::bool_<false>, mpl_::bool_<true> > > &' for 1st argument
bool parse(Iterator& first, Iterator const& last
^
/usr/include/boost/spirit/home/qi/nonterminal/rule.hpp:319:14: note: candidate function template not viable: requires 6 arguments, but 5 were provided
bool parse(Iterator& first, Iterator const& last
^
1 error generated.
“2”选项给出了本质上相同的错误消息。 gcc 似乎没有给出更好的错误信息。
完整的源代码如下:
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/std_pair.hpp>
#include <boost/variant/get.hpp>
#include <boost/variant/variant.hpp>
#include <boost/variant/recursive_variant.hpp>
#include <boost/preprocessor/stringize.hpp>
#include <vector>
#include <string>
typedef unsigned int uint;
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace mpl = boost::mpl;
// Use the type "buffer_lexer" which derives from lex_basic<Lexer>
//#define WHICH_LEXER_TYPE 1
// Use the type "buffer_lexer_raw" which does not derive from anything
//#define WHICH_LEXER_TYPE 2
// Use the "placebo" lexer, which is just lex_basic<Lexer>, as a sanity test of
// our lex:: api calls
#define WHICH_LEXER_TYPE 0
//// Lexer definition
enum tokenids {
LCARET = lex::min_token_id + 10,
RCARET,
BSLASH,
LBRACE,
RBRACE,
LPAREN,
RPAREN,
EQUALS,
USCORE,
ALPHA,
NUM,
EOL,
BLANK,
IDANY
};
#define TOKEN_CASE(X) \
case X: return #X
const char *token_id_string(size_t id) {
switch (id) {
TOKEN_CASE(LCARET);
TOKEN_CASE(RCARET);
TOKEN_CASE(BSLASH);
TOKEN_CASE(LBRACE);
TOKEN_CASE(RBRACE);
TOKEN_CASE(LPAREN);
TOKEN_CASE(RPAREN);
TOKEN_CASE(EQUALS);
TOKEN_CASE(USCORE);
TOKEN_CASE(ALPHA);
TOKEN_CASE(NUM);
TOKEN_CASE(EOL);
TOKEN_CASE(BLANK);
TOKEN_CASE(IDANY);
default:
return "Unknown token";
}
}
template <typename Lexer> struct lex_basic : lex::lexer<Lexer> {
lex_basic() {
this->self.add
('<', LCARET)
('>', RCARET)
('/', BSLASH)
('{', LBRACE)
('}', RBRACE)
('(', LPAREN)
(')', RPAREN)
('=', EQUALS)
('_', USCORE)
("[A-Za-z]", ALPHA)
("[0-9]", NUM)
('\n', EOL)
("[ \t\r]", BLANK)
(".", IDANY);
}
};
typedef std::string::const_iterator str_it;
// the token type needs to know the iterator type of the underlying
// input and the set of used token value types
typedef lex::lexertl::token<str_it, mpl::vector<char>> token_type;
template <typename TokenType> struct token_buffer {
std::vector<TokenType> tokens_;
token_buffer() = default;
bool operator()(token_type t) {
tokens_.push_back(t);
return true;
}
void print(std::ostream &o) const {
std::cout << "tokens_.size() == " << tokens_.size() << std::endl;
for (size_t i = 0; i < tokens_.size(); ++i) {
const TokenType &t = tokens_[i];
o << "[" << i << "]: -" << token_id_string(t.id()) << "- \"" << t
<< "\" [";
const auto &v = t.value();
if (t.id() == EOL) {
o << "\n";
} else {
o << v;
}
o << "]" << std::endl;
}
}
};
/***
* Lexers which serve tokens from a buffer
*/
// Two versions of the same thing, one deriving from lex::lexer, one not
template <typename LexerType> class buffer_lexer : public lex_basic<LexerType> {
public:
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator iterator_type;
private:
const buff_type &buff_;
public:
buffer_lexer(const buff_type &b) : lex_basic<LexerType>(), buff_(b) {}
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is
// needed
template <typename T> iterator_type begin(T, T) { return begin(); }
};
template <typename Iterator, typename TokenType,
typename Functor = lex::lexertl::functor<
TokenType, lex::lexertl::detail::data, Iterator>>
class buffer_lexer_raw {
typedef TokenType token_type;
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator iterator_type;
typedef typename boost::detail::iterator_traits<
typename token_type::iterator_type>::value_type char_type;
private:
buff_type buff_;
public:
buffer_lexer_raw() {}
void set_buffer(const buff_type &b) { buff_ = b; }
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is
// needed
template <typename T> iterator_type begin(T, T) { return begin(); }
std::size_t add_token(char_type const *state, char_type tokendef,
std::size_t token_id, char_type const *targetstate) {
return 1;
}
void clear(char_type const *state) {}
};
/***
* AST
*/
namespace ast {
typedef std::string Str;
struct BraceExpr;
typedef boost::variant<Str, boost::recursive_wrapper<BraceExpr>> BraceExprArg;
struct BraceExpr {
std::vector<BraceExprArg> args;
};
typedef std::pair<Str, Str> Pair;
struct Body;
typedef boost::variant<Pair, BraceExpr, boost::recursive_wrapper<Body>> Node;
struct Body {
Str key;
std::vector<Node> nodes;
};
} // end namespace ast
BOOST_FUSION_ADAPT_STRUCT(ast::BraceExpr,
(std::vector<ast::BraceExprArg>, args))
BOOST_FUSION_ADAPT_STRUCT(ast::Body,
(ast::Str, key)(std::vector<ast::Node>, nodes))
namespace ast {
// Stream ops
class printer : public boost::static_visitor<> {
std::ostream &ss_;
uint indent_;
std::string indent(uint extra = 0) const {
return std::string(indent_ + extra, ' ');
}
std::string indent_plus_tab() const { return indent(tab_width); }
public:
static constexpr uint tab_width = 4;
explicit printer(std::ostream &s, uint indent = 0)
: ss_(s), indent_(indent) {}
void operator()(const Str &s) const { ss_ << s; }
void operator()(const BraceExpr &b) const {
ss_ << "{";
for (size_t i = 0; i < b.args.size(); ++i) {
if (i) {
ss_ << " ";
}
boost::apply_visitor(*this, b.args[i]);
}
ss_ << "}";
}
void operator()(const Pair &p) const { ss_ << p.first << " = " << p.second; }
void operator()(const Body &b) const {
ss_ << indent() << "<" << b.key << ">\n";
printer p{ss_, indent_ + tab_width};
for (const auto &n : b.nodes) {
ss_ << indent_plus_tab();
boost::apply_visitor(p, n);
ss_ << "\n";
}
ss_ << indent() << "</" << b.key << ">";
}
};
std::ostream &operator<<(std::ostream &ss, const BraceExpr &b) {
printer p{ss};
p(b);
return ss;
}
std::ostream &operator<<(std::ostream &ss, const Pair &p) {
printer pr{ss};
pr(p);
return ss;
}
std::ostream &operator<<(std::ostream &ss, const Body &b) {
printer p{ss};
p(b);
return ss;
}
// Equality ops
bool operator==(const Pair &p1, const Pair &p2) {
return p1.first == p2.first && p1.second == p2.second;
}
bool operator==(const BraceExpr &b1, const BraceExpr &b2) {
return b1.args == b2.args;
}
bool operator==(const Body &b1, const Body &b2) {
return b1.key == b2.key && b1.nodes == b2.nodes;
}
bool operator!=(const Pair &p1, const Pair &p2) { return !(p1 == p2); }
bool operator!=(const BraceExpr &b1, const BraceExpr &b2) {
return !(b1 == b2);
}
bool operator!=(const Body &b1, const Body &b2) { return !(b1 == b2); }
} // end namespace ast
/***
* Grammar
*/
template <typename Iterator>
struct basic_grammar
: qi::grammar<Iterator, ast::Body(), qi::locals<ast::Str>> {
qi::rule<Iterator, ast::Body(), qi::locals<ast::Str>> body;
qi::rule<Iterator, ast::Node()> node;
qi::rule<Iterator, ast::Pair()> pair;
qi::rule<Iterator, ast::BraceExprArg()> brace_expr_arg;
qi::rule<Iterator, ast::BraceExpr()> brace_expr;
qi::rule<Iterator, ast::Str()> identifier;
qi::rule<Iterator, ast::Str()> str;
qi::rule<Iterator, ast::Str()> open_tag;
qi::rule<Iterator /*, ast::Str()*/> close_tag;
qi::rule<Iterator> lbrace;
qi::rule<Iterator> rbrace;
qi::rule<Iterator> equals;
qi::rule<Iterator> ws;
template <typename TokenDef>
basic_grammar(const TokenDef &tok)
: basic_grammar::base_type(body, "body") {
using namespace qi;
ws %= token(BLANK) | token(EOL);
lbrace %= token(LBRACE);
rbrace %= token(RBRACE);
equals %= token(EQUALS);
identifier %= token(ALPHA) >> *(token(ALPHA) | token(NUM) | token(USCORE));
str %= *(token(LCARET) | token(RCARET) | token(BSLASH) | token(LPAREN) |
token(RPAREN) | token(ALPHA) | token(NUM) | token(USCORE) |
token(EQUALS) | token(BLANK) | token(IDANY));
open_tag %= omit[token(LCARET)] >> identifier >>
omit[token(RCARET)]; // tok.open_tag;
close_tag %= omit[token(LCARET) >> token(BSLASH)] >> identifier >>
omit[token(RCARET)]; // tok.close_tag;
pair = skip(boost::proto::deep_copy(ws))[identifier >> equals >> str];
body = skip(boost::proto::deep_copy(ws))[open_tag >> *node >> close_tag];
node = brace_expr | body | pair;
brace_expr_arg = brace_expr | identifier;
brace_expr =
skip(boost::proto::deep_copy(ws))[lbrace >> *brace_expr_arg >> rbrace];
}
};
/***
* Usage / Tests
*/
// use actor_lexer<> here if your token definitions have semantic
// actions
typedef lex::lexertl::lexer<token_type> lexer_type;
// this is the iterator exposed by the lexer, we use this for parsing
typedef lexer_type::iterator_type iterator_type;
token_buffer<token_type> test_lexer(const std::string &input,
bool silent = false) {
str_it s = input.begin();
str_it end = input.end();
// create a lexer instance
lex_basic<lexer_type> lex;
token_buffer<token_type> buff;
if (!lex::tokenize(s, end, lex, [&](token_type t) { return buff(t); })) {
if (!silent) {
std::cout << "\nTokenizing failed!" << std::endl;
}
} else {
if (!silent) {
std::cout << "\nTokenizing succeeded!" << std::endl;
}
}
if (!silent) {
buff.print(std::cout);
}
return buff;
}
void test_grammar(const std::string &input) {
lex_basic<lexer_type> lex;
basic_grammar<iterator_type> gram{lex};
ast::Body tree;
{
str_it s = input.begin();
str_it end = input.end();
if (!lex::tokenize_and_parse(s, end, lex, gram, tree)) {
std::cout << "\nParsing failed!" << std::endl;
} else {
std::cout << "\nParsing succeeded!" << std::endl;
}
std::cout << tree << std::endl;
}
// Now try to do it in two steps, with buffered lexer
auto buff = test_lexer(input, true); // get buffer, silence output
#if WHICH_LEXER_TYPE == 1
buffer_lexer<lexer_type> blex{buff.tokens_};
#else
#if WHICH_LEXER_TYPE == 2
buffer_lexer_raw<str_it, token_type> blex;
blex.set_buffer(buff.tokens_);
#else
lex_basic<lexer_type> blex;
#endif
#endif
basic_grammar<iterator_type> bgram{blex};
ast::Body tree2;
{
#if (WHICH_LEXER_TYPE == 1) || (WHICH_LEXER_TYPE == 2)
auto it = blex.begin();
#else
str_it s = input.begin();
str_it end = input.end();
auto it = blex.begin(s, end);
#endif
auto fin = blex.end();
if (!qi::parse(it, fin, bgram, tree2)) {
std::cout << "\nBuffered parsing failed!" << std::endl;
} else {
std::cout << "\nBuffered parsing succeeded!" << std::endl;
}
}
std::cout << tree2 << std::endl;
if (tree != tree2) {
std::cout << "\nRegular parsing vs. buffered parsing mismatch!"
<< std::endl;
}
}
int main() {
std::string input{""
"<asdf>\n"
"foo = bar\n"
"{F foo}\n"
"{G {F foo} {H bar}}\n"
"</asdf>\n"};
test_lexer(input);
// Use lexer and grammar at once as demonstrated in tutorials
std::string input2 = "<asdf></asdf>";
test_grammar(input2);
test_grammar(input);
std::string input3{""
"<asdf>\n"
"foo = bar\n"
"{F foo}\n"
"{G {F foo} {H bar}}\n"
"<jkl>\n"
"baz = gaz\n"
"{H {H H} {{{H} {G} {F foo}} {B ar}} {Q i}}\n"
"</jkl>\n"
"</asdf>\n"};
test_grammar(input3);
return 0;
}
我也认为是多通道问题的罪魁祸首,但经过多次摆弄后,我能够通过 2 个简单的修复来让它工作 ¹
template <typename Iterator, typename TokenType,
typename Functor = lex::lexertl::functor<
TokenType, lex::lexertl::detail::data, Iterator>>
class buffer_lexer_raw {
typedef TokenType token_type;
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator base_iterator_type;
public:
struct iterator_type : base_iterator_type {
typedef base_iterator_type base_iterator_type;
using base_iterator_type::base_iterator_type;
};
typedef char char_type;
这确保嵌套的 iterator_type
本身具有 base_iterator_type
类型。这似乎是库内部某处所必需的(可能是由于对令牌迭代器的假设)。
第二部分是实际实例化语法的地方,不要使用"plain"迭代器,而是我们刚刚定义的:
basic_grammar<concrete_lexer_type::iterator_type> bgram{blex};
完全可用的清单:
#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/std_pair.hpp>
#include <boost/variant/get.hpp>
#include <boost/variant/variant.hpp>
#include <boost/variant/recursive_variant.hpp>
#include <boost/preprocessor/stringize.hpp>
#include <vector>
#include <string>
typedef unsigned int uint;
namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace mpl = boost::mpl;
//// Lexer definition
enum tokenids {
LCARET = lex::min_token_id + 10,
RCARET,
BSLASH,
LBRACE,
RBRACE,
LPAREN,
RPAREN,
EQUALS,
USCORE,
ALPHA,
NUM,
EOL,
BLANK,
IDANY
};
#define TOKEN_CASE(X) \
case X: return #X
const char *token_id_string(size_t id) {
switch (id) {
TOKEN_CASE(LCARET);
TOKEN_CASE(RCARET);
TOKEN_CASE(BSLASH);
TOKEN_CASE(LBRACE);
TOKEN_CASE(RBRACE);
TOKEN_CASE(LPAREN);
TOKEN_CASE(RPAREN);
TOKEN_CASE(EQUALS);
TOKEN_CASE(USCORE);
TOKEN_CASE(ALPHA);
TOKEN_CASE(NUM);
TOKEN_CASE(EOL);
TOKEN_CASE(BLANK);
TOKEN_CASE(IDANY);
default:
return "Unknown token";
}
}
template <typename Lexer> struct lex_basic : lex::lexer<Lexer> {
lex_basic() {
this->self.add
('<', LCARET)
('>', RCARET)
('/', BSLASH)
('{', LBRACE)
('}', RBRACE)
('(', LPAREN)
(')', RPAREN)
('=', EQUALS)
('_', USCORE)
("[A-Za-z]", ALPHA)
("[0-9]", NUM)
('\n', EOL)
("[ \t\r]", BLANK)
(".", IDANY);
}
};
typedef std::string::const_iterator str_it;
// the token type needs to know the iterator type of the underlying
// input and the set of used token value types
typedef lex::lexertl::token<str_it, mpl::vector<char>> token_type;
template <typename TokenType> struct token_buffer {
std::vector<TokenType> tokens_;
token_buffer() = default;
bool operator()(token_type t) {
tokens_.push_back(t);
return true;
}
void print(std::ostream &o) const {
std::cout << "tokens_.size() == " << tokens_.size() << std::endl;
for (size_t i = 0; i < tokens_.size(); ++i) {
const TokenType &t = tokens_[i];
o << "[" << i << "]: -" << token_id_string(t.id()) << "- \"" << t
<< "\" [";
const auto &v = t.value();
if (t.id() == EOL) {
o << "\n";
} else {
o << v;
}
o << "]" << std::endl;
}
}
};
/***
* Lexers which serve tokens from a buffer
*/
// Two versions of the same thing, one deriving from lex::lexer, one not
template <typename LexerType> class buffer_lexer : public lex_basic<LexerType> {
public:
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator iterator_type;
private:
const buff_type &buff_;
public:
buffer_lexer(const buff_type &b) : lex_basic<LexerType>(), buff_(b) {}
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is
// needed
template <typename T> iterator_type begin(T, T) { return begin(); }
};
template <typename Iterator, typename TokenType,
typename Functor = lex::lexertl::functor<
TokenType, lex::lexertl::detail::data, Iterator>>
class buffer_lexer_raw {
typedef TokenType token_type;
typedef std::vector<token_type> buff_type;
typedef typename buff_type::const_iterator vec_iterator_type;
public:
struct iterator_type : vec_iterator_type {
typedef vec_iterator_type base_iterator_type;
using vec_iterator_type::vec_iterator_type;
};
typedef char char_type;
private:
buff_type buff_;
public:
buffer_lexer_raw() {}
void set_buffer(const buff_type &b) { buff_ = b; }
iterator_type begin() const { return buff_.begin(); }
iterator_type end() const { return buff_.end(); }
// for consistency with regular lexer `begin` signature, not sure if this is
// needed
template <typename T> iterator_type begin(T, T) { return begin(); }
std::size_t add_token(char_type const*, char_type, std::size_t, char_type const*) {
return 1;
}
void clear(char_type const *) {}
};
/***
* AST
*/
namespace ast {
typedef std::string Str;
struct BraceExpr;
typedef boost::variant<Str, boost::recursive_wrapper<BraceExpr>> BraceExprArg;
struct BraceExpr {
std::vector<BraceExprArg> args;
};
typedef std::pair<Str, Str> Pair;
struct Body;
typedef boost::variant<Pair, BraceExpr, boost::recursive_wrapper<Body>> Node;
struct Body {
Str key;
std::vector<Node> nodes;
};
} // end namespace ast
BOOST_FUSION_ADAPT_STRUCT(ast::BraceExpr,
(std::vector<ast::BraceExprArg>, args))
BOOST_FUSION_ADAPT_STRUCT(ast::Body,
(ast::Str, key)(std::vector<ast::Node>, nodes))
namespace ast {
// Stream ops
class printer : public boost::static_visitor<> {
std::ostream &ss_;
uint indent_;
std::string indent(uint extra = 0) const { return std::string(indent_ + extra, ' '); }
std::string indent_plus_tab() const { return indent(tab_width); }
public:
static constexpr uint tab_width = 4;
explicit printer(std::ostream &s, uint indent = 0)
: ss_(s), indent_(indent) {}
void operator()(const Str &s) const { ss_ << s; }
void operator()(const BraceExpr &b) const {
ss_ << "{";
for (size_t i = 0; i < b.args.size(); ++i) {
if (i) {
ss_ << " ";
}
boost::apply_visitor(*this, b.args[i]);
}
ss_ << "}";
}
void operator()(const Pair &p) const { ss_ << p.first << " = " << p.second; }
void operator()(const Body &b) const {
ss_ << indent() << "<" << b.key << ">\n";
printer p{ss_, indent_ + tab_width};
for (const auto &n : b.nodes) {
ss_ << indent_plus_tab();
boost::apply_visitor(p, n);
ss_ << "\n";
}
ss_ << indent() << "</" << b.key << ">";
}
};
std::ostream &operator<<(std::ostream &ss, const BraceExpr &b) {
printer p{ss};
p(b);
return ss;
}
std::ostream &operator<<(std::ostream &ss, const Pair &p) {
printer pr{ss};
pr(p);
return ss;
}
std::ostream &operator<<(std::ostream &ss, const Body &b) {
printer p{ss};
p(b);
return ss;
}
// Equality ops
bool operator==(const Pair &p1, const Pair &p2) {
return p1.first == p2.first && p1.second == p2.second;
}
bool operator==(const BraceExpr &b1, const BraceExpr &b2) {
return b1.args == b2.args;
}
bool operator==(const Body &b1, const Body &b2) {
return b1.key == b2.key && b1.nodes == b2.nodes;
}
bool operator!=(const Pair &p1, const Pair &p2) { return !(p1 == p2); }
bool operator!=(const BraceExpr &b1, const BraceExpr &b2) {
return !(b1 == b2);
}
bool operator!=(const Body &b1, const Body &b2) { return !(b1 == b2); }
} // end namespace ast
/***
* Grammar
*/
template <typename Iterator>
struct basic_grammar : qi::grammar<Iterator, ast::Body(), qi::locals<ast::Str>> {
qi::rule<Iterator, ast::Body(), qi::locals<ast::Str>> body;
qi::rule<Iterator, ast::Node()> node;
qi::rule<Iterator, ast::Pair()> pair;
qi::rule<Iterator, ast::BraceExprArg()> brace_expr_arg;
qi::rule<Iterator, ast::BraceExpr()> brace_expr;
qi::rule<Iterator, ast::Str()> identifier;
qi::rule<Iterator, ast::Str()> str;
qi::rule<Iterator, ast::Str()> open_tag;
qi::rule<Iterator /*, ast::Str()*/> close_tag;
qi::rule<Iterator> lbrace;
qi::rule<Iterator> rbrace;
qi::rule<Iterator> equals;
qi::rule<Iterator> ws;
template <typename TokenDef>
basic_grammar(const TokenDef &tok) : basic_grammar::base_type(body, "body") {
using namespace qi;
ws %= token(BLANK) | token(EOL);
lbrace %= token(LBRACE);
rbrace %= token(RBRACE);
equals %= token(EQUALS);
identifier %= token(ALPHA) >> *(token(ALPHA) | token(NUM) | token(USCORE));
str %= *(token(LCARET) | token(RCARET) | token(BSLASH) | token(LPAREN) |
token(RPAREN) | token(ALPHA) | token(NUM) | token(USCORE) |
token(EQUALS) | token(BLANK) | token(IDANY));
open_tag %= omit[token(LCARET)] >> identifier >> omit[token(RCARET)]; // tok.open_tag;
close_tag %= omit[token(LCARET) >> token(BSLASH)] >> identifier >> omit[token(RCARET)]; // tok.close_tag;
// TODO FIXME the deep_copy shoudl be not required there
/// bla_12 = somevalue
pair = skip(boost::proto::deep_copy(ws)) [ identifier >> equals >> str ] ;
/// <bla><sub>{some}{braced{expres}}sions</sub><pair1>key1=value</pair1></bla>
body = skip(boost::proto::deep_copy(ws)) [ open_tag >> *node >> close_tag ] ;
///
node = brace_expr | body | pair;
brace_expr_arg = brace_expr | identifier;
/// {{{bla}some{other}nested{id{entifier}s}}and such}
brace_expr = skip(boost::proto::deep_copy(ws))[lbrace >> *brace_expr_arg >> rbrace];
}
};
/***
* Usage / Tests
*/
// use actor_lexer<> here if your token definitions have semantic
// actions
typedef lex::lexertl::lexer<token_type> lexer_type;
// this is the iterator exposed by the lexer, we use this for parsing
typedef lexer_type::iterator_type iterator_type;
token_buffer<token_type> test_lexer(const std::string &input,
bool silent = false) {
str_it s = input.begin();
str_it end = input.end();
// create a lexer instance
lex_basic<lexer_type> lex;
token_buffer<token_type> buff;
if (!lex::tokenize(s, end, lex, [&](token_type t) { return buff(t); })) {
if (!silent) {
std::cout << "\nTokenizing failed!" << std::endl;
}
} else {
if (!silent) {
std::cout << "\nTokenizing succeeded!" << std::endl;
}
}
if (!silent) {
buff.print(std::cout);
}
return buff;
}
void test_grammar(const std::string &input) {
lex_basic<lexer_type> lex;
basic_grammar<iterator_type> gram{lex};
ast::Body tree;
{
str_it s = input.begin();
str_it end = input.end();
if (!lex::tokenize_and_parse(s, end, lex, gram, tree)) {
std::cout << "\nParsing failed!" << std::endl;
} else {
std::cout << "\nParsing succeeded!" << std::endl;
}
std::cout << tree << std::endl;
}
// Now try to do it in two steps, with buffered lexer
auto buff = test_lexer(input, true); // get buffer, silence output
typedef buffer_lexer_raw<str_it, token_type> concrete_lexer_type;
buffer_lexer_raw<str_it, token_type> blex;
blex.set_buffer(buff.tokens_);
basic_grammar<concrete_lexer_type::iterator_type> bgram{blex};
ast::Body tree2;
{
auto it = blex.begin();
auto fin = blex.end();
if (!qi::parse(it, fin, bgram, tree2)) {
std::cout << "\nBuffered parsing failed!" << std::endl;
} else {
std::cout << "\nBuffered parsing succeeded!" << std::endl;
}
}
std::cout << tree2 << std::endl;
if (tree != tree2) {
std::cout << "\nRegular parsing vs. buffered parsing mismatch!"
<< std::endl;
}
}
int main() {
std::string const input{""
"<asdf>\n"
"foo = bar\n"
"{F foo}\n"
"{G {F foo} {H bar}}\n"
"</asdf>\n"};
test_lexer(input);
// Use lexer and grammar at once as demonstrated in tutorials
std::string const input2 = "<asdf></asdf>";
test_grammar(input2);
test_grammar(input);
std::string const input3{""
"<asdf>\n"
"foo = bar\n"
"{F foo}\n"
"{G {F foo} {H bar}}\n"
"<jkl>\n"
"baz = gaz\n"
"{H {H H} {{{H} {G} {F foo}} {B ar}} {Q i}}\n"
"</jkl>\n"
"</asdf>\n"};
test_grammar(input3);
}
正在打印:
Tokenizing succeeded!
tokens_.size() == 53
[0]: -LCARET- "65546" [<]
[1]: -ALPHA- "65555" [a]
[2]: -ALPHA- "65555" [s]
[3]: -ALPHA- "65555" [d]
[4]: -ALPHA- "65555" [f]
[5]: -RCARET- "65547" [>]
[6]: -EOL- "65557" [\n]
[7]: -ALPHA- "65555" [f]
[8]: -ALPHA- "65555" [o]
[9]: -ALPHA- "65555" [o]
[10]: -BLANK- "65558" [ ]
[11]: -EQUALS- "65553" [=]
[12]: -BLANK- "65558" [ ]
[13]: -ALPHA- "65555" [b]
[14]: -ALPHA- "65555" [a]
[15]: -ALPHA- "65555" [r]
[16]: -EOL- "65557" [\n]
[17]: -LBRACE- "65549" [{]
[18]: -ALPHA- "65555" [F]
[19]: -BLANK- "65558" [ ]
[20]: -ALPHA- "65555" [f]
[21]: -ALPHA- "65555" [o]
[22]: -ALPHA- "65555" [o]
[23]: -RBRACE- "65550" [}]
[24]: -EOL- "65557" [\n]
[25]: -LBRACE- "65549" [{]
[26]: -ALPHA- "65555" [G]
[27]: -BLANK- "65558" [ ]
[28]: -LBRACE- "65549" [{]
[29]: -ALPHA- "65555" [F]
[30]: -BLANK- "65558" [ ]
[31]: -ALPHA- "65555" [f]
[32]: -ALPHA- "65555" [o]
[33]: -ALPHA- "65555" [o]
[34]: -RBRACE- "65550" [}]
[35]: -BLANK- "65558" [ ]
[36]: -LBRACE- "65549" [{]
[37]: -ALPHA- "65555" [H]
[38]: -BLANK- "65558" [ ]
[39]: -ALPHA- "65555" [b]
[40]: -ALPHA- "65555" [a]
[41]: -ALPHA- "65555" [r]
[42]: -RBRACE- "65550" [}]
[43]: -RBRACE- "65550" [}]
[44]: -EOL- "65557" [\n]
[45]: -LCARET- "65546" [<]
[46]: -BSLASH- "65548" [/]
[47]: -ALPHA- "65555" [a]
[48]: -ALPHA- "65555" [s]
[49]: -ALPHA- "65555" [d]
[50]: -ALPHA- "65555" [f]
[51]: -RCARET- "65547" [>]
[52]: -EOL- "65557" [\n]
Parsing succeeded!
<asdf>
</asdf>
Buffered parsing succeeded!
<asdf>
</asdf>
Parsing succeeded!
<asdf>
foo = bar
{F foo}
{G {F foo} {H bar}}
</asdf>
Buffered parsing succeeded!
<asdf>
foo = bar
{F foo}
{G {F foo} {H bar}}
</asdf>
Parsing succeeded!
<asdf>
foo = bar
{F foo}
{G {F foo} {H bar}}
<jkl>
baz = gaz
{H {H H} {{{H} {G} {F foo}} {B ar}} {Q i}}
</jkl>
</asdf>
Buffered parsing succeeded!
<asdf>
foo = bar
{F foo}
{G {F foo} {H bar}}
<jkl>
baz = gaz
{H {H H} {{{H} {G} {F foo}} {B ar}} {Q i}}
</jkl>
</asdf>
¹ 基于 buffer_lexer_raw
方法