无法让 Boost Spirit 语法使用 std::map<> 的已知键
Cannot get Boost Spirit grammar to use known keys for std::map<>
我似乎在使用 Boost Spirit 时遇到了一些精神障碍,我无法克服。我有一个我需要处理的相当简单的语法,我想将值放入一个结构中,该结构包含一个 std::map<> 作为它的成员之一。这些对的键名是预先知道的,因此只允许使用这些键名。映射中可以有一对多键,顺序任意,每个键名都通过 qi 验证。
作为示例,语法看起来像这样。
test .|*|<hostname> add|modify|save ( key [value] key [value] ... ) ;
//
test . add ( a1 ex00
a2 ex01
a3 "ex02,ex03,ex04" );
//
test * modify ( m1 ex10
m2 ex11
m3 "ex12,ex13,ex14"
m4 "abc def ghi" );
//
test 10.0.0.1 clear ( c1
c2
c3 );
在此示例中,“添加”的键是 a1、a2 和 a3,同样,“修改”的键是 m1、m2、m3 和 m4,每个键都必须包含一个值。对于“clear”,映射 c1、c2 和 c3 的键可能不包含值。此外,假设对于此示例,您最多可以有 10 个键(a1 ... a11、m1 ... m11 和 c1 ... c11),可以按任何顺序使用它们的任意组合来执行相应的操作。这意味着您不能将已知密钥 cX 用于 "add" 或将 mX 用于 "clear"
结构遵循这个简单的模式
//
struct test
{
std::string host;
std::string action;
std::map<std::string,std::string> option;
}
所以根据上面的例子,我希望结构包含...
// add ...
test.host = .
test.action = add
test.option[0].first = a1
test.option[0].second = ex00
test.option[1].first = a2
test.option[1].second = ex01
test.option[2].first = a3
test.option[2].second = ex02,ex03,ex04
// modify ...
test.host = *
test.action = modify
test.option[0].first = m1
test.option[0].second = ex10
test.option[1].first = m2
test.option[1].second = ex11
test.option[2].first = m3
test.option[2].second = ex12,ex13,ex14
test.option[2].first = m3
test.option[2].second = abc def ghi
// clear ...
test.host = *
test.action = 10.0.0.1
test.option[0].first = c1
test.option[0].second =
test.option[1].first = c2
test.option[1].second =
test.option[2].first = c3
test.option[2].second =
我可以让每个单独的部分独立工作,但在他们看来我无法一起工作。例如,我的主机和动作在没有地图<>的情况下工作。
我改编了之前发布的来自 Sehe (here) trying to get this to work (BTW: Sehe 的示例,其中有一些很棒的示例,我一直在使用它们,就像文档一样。
这是一段摘录(显然没有用),但至少显示了我想去的地方。
namespace ast {
namespace qi = boost::spirit::qi;
//
using unused = qi::unused_type;
//
using string = std::string;
using strings = std::vector<string>;
using list = strings;
using pair = std::pair<string, string>;
using map = std::map<string, string>;
//
struct test
{
using preference = std::map<string,string>;
string host;
string action;
preference option;
};
}
//
BOOST_FUSION_ADAPT_STRUCT( ast::test,
( std::string, host )
( std::string, action ) )
( ast::test::preference, option ) )
//
namespace grammar
{
//
template <typename It>
struct parser
{
//
struct skip : qi::grammar<It>
{
//
skip() : skip::base_type( text )
{
using namespace qi;
// handle all whitespace (" ", \t, ...)
// along with comment lines/blocks
//
// comment blocks: /* ... */
// // ...
// -- ...
// # ...
text = ascii::space
| ( "#" >> *( char_ - eol ) >> ( eoi | eol ) ) // line comment
| ( "--" >> *( char_ - eol ) >> ( eoi | eol ) ) // ...
| ( "//" >> *( char_ - eol ) >> ( eoi | eol ) ) // ...
| ( "/*" >> *( char_ - "*/" ) >> "*/" ); // block comment
//
BOOST_SPIRIT_DEBUG_NODES( ( text ) )
}
//
qi::rule<It> text;
};
//
struct token
{
//
token()
{
using namespace qi;
// common
string = '"' >> *("\" >> char_ | ~char_('"')) >> '"';
identity = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
real = double_;
integer = int_;
//
value = ( string | identity );
// ip target
any = '*';
local = ( char_('.') | fqdn );
fqdn = +char_("a-zA-Z0-9.\-" ); // consession
ipv4 = +as_string[ octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
>> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
>> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
>> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] ];
//
target = ( any | local | fqdn | ipv4 );
//
pair = identity >> -( attr( ' ' ) >> value );
map = pair >> *( attr( ' ' ) >> pair );
list = *( value );
//
BOOST_SPIRIT_DEBUG_NODES( ( string )
( identity )
( value )
( real )
( integer )
( any )
( local )
( fqdn )
( ipv4 )
( target )
( pair )
( keyval )
( map )
( list ) )
}
//
qi::rule<It, std::string()> string;
qi::rule<It, std::string()> identity;
qi::rule<It, std::string()> value;
qi::rule<It, double()> real;
qi::rule<It, int()> integer;
qi::uint_parser<unsigned, 10, 1, 3> octet;
qi::rule<It, std::string()> any;
qi::rule<It, std::string()> local;
qi::rule<It, std::string()> fqdn;
qi::rule<It, std::string()> ipv4;
qi::rule<It, std::string()> target;
//
qi::rule<It, ast::map()> map;
qi::rule<It, ast::pair()> pair;
qi::rule<It, ast::pair()> keyval;
qi::rule<It, ast::list()> list;
};
//
struct test : token, qi::grammar<It, ast::test(), skip>
{
//
test() : test::base_type( command_ )
{
using namespace qi;
using namespace qr;
auto kw = qr::distinct( copy( char_( "a-zA-Z0-9_" ) ) );
// not sure how to enforce the "key" names!
key_ = *( '(' >> *value >> ')' );
// tried using token::map ... didn't work ...
//
add_ = ( ( "add" >> attr( ' ' ) ) [ _val = "add" ] );
modify_ = ( ( "modify" >> attr( ' ' ) ) [ _val = "modify" ] );
clear_ = ( ( "clear" >> attr( ' ' ) ) [ _val = "clear" ] );
//
action_ = ( add_ | modify_ | clear_ );
/* *** can't get from A to B here ... not sure what to do *** */
//
command_ = kw[ "test" ]
>> target
>> action_
>> ';';
BOOST_SPIRIT_DEBUG_NODES( ( command_ )
( action_ )
( add_ )
( modify_ )
( clear_ ) )
}
//
private:
//
using token::value;
using token::target;
using token::map;
qi::rule<It, ast::test(), skip> command_;
qi::rule<It, std::string(), skip> action_;
//
qi::rule<It, std::string(), skip> add_;
qi::rule<It, std::string(), skip> modify_;
qi::rule<It, std::string(), skip> clear_;
};
...
};
}
我希望这个问题不会太含糊,如果您需要该问题的工作示例,我当然可以提供。非常感谢您提供的任何帮助,在此先感谢您!
备注:
有了这个
add_ = ( ( "add" >> attr( ' ' ) ) [ _val = "add" ] );
modify_ = ( ( "modify" >> attr( ' ' ) ) [ _val = "modify" ] );
clear_ = ( ( "clear" >> attr( ' ' ) ) [ _val = "clear" ] );
您的意思是要求 space 吗?或者你真的只是试图强制 struct action
字段包含尾随 space (这就是会发生的事情)。
如果您指的是后者,我会在解析器之外执行此操作¹。
如果您想要第一个,请使用 kw
工具:
add_ = kw["add"] [ _val = "add" ];
modify_ = kw["modify"] [ _val = "modify" ];
clear_ = kw["clear"] [ _val = "clear" ];
事实上,您可以简化它(再次,¹):
add_ = raw[ kw["add"] ];
modify_ = raw[ kw["modify"] ];
clear_ = raw[ kw["clear"] ];
这也意味着你可以简化为
action_ = raw[ kw[lit("add")|"modify"|"clear"] ];
但是,为了更接近您的问题,您还可以使用 symbol parser:
symbols<char> action_sym;
action_sym += "add", "modify", "clear";
//
action_ = raw[ kw[action_sym] ];
Caveat: the symbols needs to be a member so its lifetime extends beyond the constructor.
如果您打算使用
捕获 ipv4 地址的输入表示
ipv4 = +as_string[ octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
>> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
>> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
>> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] ];
Side note I'm assuming +as_string
is a simple mistake and you meant as_string
instead.
简化:
qi::uint_parser<uint8_t, 10, 1, 3> octet;
这避免了范围检查(再次参见 ¹):
ipv4 = as_string[ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
但是,这将构建地址的 4 字符二进制字符串表示形式。如果你想要那个,很好。我对此表示怀疑(因为你会写 std::array<uint8_t, 4>
或 uint64_t
,对吧?)。所以如果你想要字符串,再次使用 raw[]
:
ipv4 = raw[ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
与数字 1 相同的问题:
pair = identity >> -( attr(' ') >> value );
这一次,问题暴露了作品不应该在token
;从概念上讲,token
-izing 在解析之前进行,因此我会保留标记 skipper-less。 kw
在这种情况下并没有真正发挥多大作用。相反,我会将 pair
、map
和 list
(未使用?)移动到解析器中:
pair = kw[identity] >> -value;
map = +pair;
list = *value;
一些例子
我最近做了一个关于使用 symbols
解析 () 的例子,但这个答案更接近你的问题:
它 远远 超出了您的解析器的范围,因为它在语法中执行各种操作,但 显示的内容是要有可以用特定 "symbol set" 参数化的通用 "lookup-ish" 规则:请参阅答案的 :
Identifier Lookup
We store "symbol tables" in Domain
members _variables
and
_functions
:
using Domain = qi::symbols<char>; Domain _variables, _functions;
Then we declare some rules that can do lookups on either of them:
// domain identifier lookups
qi::_r1_type _domain;
qi::rule<It, Ast::Identifier(Domain const&)> maybe_known, known,
unknown;
The corresponding declarations will be shown shortly.
Variables are pretty simple:
variable = maybe_known(phx::ref(_variables));
Calls are trickier. If a name is unknown we don't want to assume it
implies a function unless it's followed by a '('
character.
However, if an identifier is a known function name, we want even to
imply the (
(this gives the UX the appearance of autocompletion
where when the user types sqrt
, it suggests the next character to be
(
magically).
// The heuristics: // - an unknown identifier followed by (
// - an unclosed argument list implies ) call %= (
known(phx::ref(_functions)) // known -> imply the parens
| &(identifier >> '(') >> unknown(phx::ref(_functions))
) >> implied('(') >> -(expression % ',') >> implied(')');
It all builds on known
, unknown
and maybe_known
:
///////////////////////////////
// identifier loopkup, suggesting
{
maybe_known = known(_domain) | unknown(_domain);
// distinct to avoid partially-matching identifiers
using boost::spirit::repository::qi::distinct;
auto kw = distinct(copy(alnum | '_'));
known = raw[kw[lazy(_domain)]];
unknown = raw[identifier[_val=_1]] [suggest_for(_1, _domain)];
}
我认为您可以在这里建设性地使用相同的方法。另一个技巧可能是验证所提供的属性实际上是唯一的。
演示作品
结合上面的所有提示使其编译和"parse"测试命令:
#include <string>
#include <map>
#include <vector>
namespace ast {
//
using string = std::string;
using strings = std::vector<string>;
using list = strings;
using pair = std::pair<string, string>;
using map = std::map<string, string>;
//
struct command {
string host;
string action;
map option;
};
}
#include <boost/fusion/adapted.hpp>
BOOST_FUSION_ADAPT_STRUCT(ast::command, host, action, option)
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/repository/include/qi_distinct.hpp>
namespace grammar
{
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct parser
{
struct skip : qi::grammar<It> {
skip() : skip::base_type(text) {
using namespace qi;
// handle all whitespace along with line/block comments
text = ascii::space
| (lit("#")|"--"|"//") >> *(char_ - eol) >> (eoi | eol) // line comment
| "/*" >> *(char_ - "*/") >> "*/"; // block comment
//
BOOST_SPIRIT_DEBUG_NODES((text))
}
private:
qi::rule<It> text;
};
//
struct token {
//
token() {
using namespace qi;
// common
string = '"' >> *("\" >> char_ | ~char_('"')) >> '"';
identity = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
value = string | identity;
// ip target
any = '*';
local = '.' | fqdn;
fqdn = +char_("a-zA-Z0-9.\-"); // concession
ipv4 = raw [ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
//
target = any | local | fqdn | ipv4;
//
BOOST_SPIRIT_DEBUG_NODES(
(string) (identity) (value)
(any) (local) (fqdn) (ipv4) (target)
)
}
protected:
//
qi::rule<It, std::string()> string;
qi::rule<It, std::string()> identity;
qi::rule<It, std::string()> value;
qi::uint_parser<uint8_t, 10, 1, 3> octet;
qi::rule<It, std::string()> any;
qi::rule<It, std::string()> local;
qi::rule<It, std::string()> fqdn;
qi::rule<It, std::string()> ipv4;
qi::rule<It, std::string()> target;
};
//
struct test : token, qi::grammar<It, ast::command(), skip> {
//
test() : test::base_type(command_)
{
using namespace qi;
auto kw = qr::distinct( copy( char_( "a-zA-Z0-9_" ) ) );
//
action_sym += "add", "modify", "clear";
action_ = raw[ kw[action_sym] ];
//
command_ = kw["test"]
>> target
>> action_
>> '(' >> map >> ')'
>> ';';
//
pair = kw[identity] >> -value;
map = +pair;
list = *value;
BOOST_SPIRIT_DEBUG_NODES(
(command_) (action_)
(pair) (map) (list)
)
}
private:
using token::target;
using token::identity;
using token::value;
qi::symbols<char> action_sym;
//
qi::rule<It, ast::command(), skip> command_;
qi::rule<It, std::string(), skip> action_;
//
qi::rule<It, ast::map(), skip> map;
qi::rule<It, ast::pair(), skip> pair;
qi::rule<It, ast::list(), skip> list;
};
};
}
#include <fstream>
int main() {
using It = boost::spirit::istream_iterator;
using Parser = grammar::parser<It>;
std::ifstream input("input.txt");
It f(input >> std::noskipws), l;
Parser::skip const s{};
Parser::test const p{};
std::vector<ast::command> data;
bool ok = phrase_parse(f, l, *p, s, data);
if (ok) {
std::cout << "Parsed " << data.size() << " commands\n";
} else {
std::cout << "Parsed failed\n";
}
if (f != l) {
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
}
打印
Parsed 3 commands
让我们限制密钥
就像上面的链接答案一样,让我们通过 map
、pair
规则设置实际的键以从以下位置获取允许的值:
using KeySet = qi::symbols<char>;
using KeyRef = KeySet const*;
//
KeySet add_keys, modify_keys, clear_keys;
qi::symbols<char, KeyRef> action_sym;
qi::rule<It, ast::pair(KeyRef), skip> pair;
qi::rule<It, ast::map(KeyRef), skip> map;
Note A key feature used is the associated attribute value with a symbols<>
lookup (in this case we associate a KeyRef
with an action symbol):
//
add_keys += "a1", "a2", "a3", "a4", "a5", "a6";
modify_keys += "m1", "m2", "m3", "m4";
clear_keys += "c1", "c2", "c3", "c4", "c5";
action_sym.add
("add", &add_keys)
("modify", &modify_keys)
("clear", &clear_keys);
现在开始繁重的工作。
使用 qi::locals<>
和 继承属性
让我们给 command_
一些本地 space 来存储选定的键集:
qi::rule<It, ast::command(), skip, qi::locals<KeyRef> > command_;
现在我们原则上可以分配给它(使用 _a
占位符)。但是,有一些细节:
//
qi::_a_type selected;
总是喜欢描述性的名字 :) _a
和 _r1
很快就变老了。事情已经够混乱了。
command_ %= kw["test"]
>> target
>> raw[ kw[action_sym] [ selected = _1 ] ]
>> '(' >> map(selected) >> ')'
>> ';';
Note: the subtlest detail here is %=
instead of =
to avoid the suppression of automatic attribute propagation when a semantic action is present (yeah, see ¹ again...)
但总而言之,读起来还不错吧?
//
qi::_r1_type symref;
pair = raw[ kw[lazy(*symref)] ] >> -value;
map = +pair(symref);
现在至少东西解析了
快到了
//#define BOOST_SPIRIT_DEBUG
#include <string>
#include <map>
#include <vector>
namespace ast {
//
using string = std::string;
using strings = std::vector<string>;
using list = strings;
using pair = std::pair<string, string>;
using map = std::map<string, string>;
//
struct command {
string host;
string action;
map option;
};
}
#include <boost/fusion/adapted.hpp>
BOOST_FUSION_ADAPT_STRUCT(ast::command, host, action, option)
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/repository/include/qi_distinct.hpp>
namespace grammar
{
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct parser
{
struct skip : qi::grammar<It> {
skip() : skip::base_type(rule_) {
using namespace qi;
// handle all whitespace along with line/block comments
rule_ = ascii::space
| (lit("#")|"--"|"//") >> *(char_ - eol) >> (eoi | eol) // line comment
| "/*" >> *(char_ - "*/") >> "*/"; // block comment
//
//BOOST_SPIRIT_DEBUG_NODES((skipper))
}
private:
qi::rule<It> rule_;
};
//
struct token {
//
token() {
using namespace qi;
// common
string = '"' >> *("\" >> char_ | ~char_('"')) >> '"';
identity = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
value = string | identity;
// ip target
any = '*';
local = '.' | fqdn;
fqdn = +char_("a-zA-Z0-9.\-"); // concession
ipv4 = raw [ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
//
target = any | local | fqdn | ipv4;
//
BOOST_SPIRIT_DEBUG_NODES(
(string) (identity) (value)
(any) (local) (fqdn) (ipv4) (target)
)
}
protected:
//
qi::rule<It, std::string()> string;
qi::rule<It, std::string()> identity;
qi::rule<It, std::string()> value;
qi::uint_parser<uint8_t, 10, 1, 3> octet;
qi::rule<It, std::string()> any;
qi::rule<It, std::string()> local;
qi::rule<It, std::string()> fqdn;
qi::rule<It, std::string()> ipv4;
qi::rule<It, std::string()> target;
};
//
struct test : token, qi::grammar<It, ast::command(), skip> {
//
test() : test::base_type(start_)
{
using namespace qi;
auto kw = qr::distinct( copy( char_( "a-zA-Z0-9_" ) ) );
//
add_keys += "a1", "a2", "a3", "a4", "a5", "a6";
modify_keys += "m1", "m2", "m3", "m4";
clear_keys += "c1", "c2", "c3", "c4", "c5";
action_sym.add
("add", &add_keys)
("modify", &modify_keys)
("clear", &clear_keys);
//
qi::_a_type selected;
command_ %= kw["test"]
>> target
>> raw[ kw[action_sym] [ selected = _1 ] ]
>> '(' >> map(selected) >> ')'
>> ';';
//
qi::_r1_type symref;
pair = raw[ kw[lazy(*symref)] ] >> -value;
map = +pair(symref);
list = *value;
start_ = command_;
BOOST_SPIRIT_DEBUG_NODES(
(start_) (command_)
(pair) (map) (list)
)
}
private:
using token::target;
using token::identity;
using token::value;
using KeySet = qi::symbols<char>;
using KeyRef = KeySet const*;
//
qi::rule<It, ast::command(), skip> start_;
qi::rule<It, ast::command(), skip, qi::locals<KeyRef> > command_;
//
KeySet add_keys, modify_keys, clear_keys;
qi::symbols<char, KeyRef> action_sym;
qi::rule<It, ast::pair(KeyRef), skip> pair;
qi::rule<It, ast::map(KeyRef), skip> map;
qi::rule<It, ast::list(), skip> list;
};
};
}
#include <fstream>
int main() {
using It = boost::spirit::istream_iterator;
using Parser = grammar::parser<It>;
std::ifstream input("input.txt");
It f(input >> std::noskipws), l;
Parser::skip const s{};
Parser::test const p{};
std::vector<ast::command> data;
bool ok = phrase_parse(f, l, *p, s, data);
if (ok) {
std::cout << "Parsed " << data.size() << " commands\n";
} else {
std::cout << "Parsed failed\n";
}
if (f != l) {
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
}
打印
Parsed 3 commands
稍等,别那么快!错了
是的。如果你启用调试,你会看到它解析的东西很奇怪:
<attributes>[[[1, 0, ., 0, ., 0, ., 1], [c, l, e, a, r], [[[c, 1], [c, 2]], [[c, 3], []]]]]</attributes>
这实际上是"merely"语法问题。如果语法看不出 key
和 value
之间的区别,那么显然 c2
将被解析为 [=222= 的 value ] 键 c1
.
由您来消除语法歧义。现在,我将使用 否定断言 来演示修复:我们只接受不是 已知键 的值。它有点脏,但可能对您的教学有用:
key = raw[ kw[lazy(*symref)] ];
pair = key(symref) >> -(!key(symref) >> value);
map = +pair(symref);
请注意,为了可读性,我提取了 key
规则:
解析
<attributes>[[[1, 0, ., 0, ., 0, ., 1], [c, l, e, a, r], [[[c, 1], []], [[c, 2], []], [[c, 3], []]]]]</attributes>
正是医生所嘱咐的!
¹ Boost Spirit: "Semantic actions are evil"?
我似乎在使用 Boost Spirit 时遇到了一些精神障碍,我无法克服。我有一个我需要处理的相当简单的语法,我想将值放入一个结构中,该结构包含一个 std::map<> 作为它的成员之一。这些对的键名是预先知道的,因此只允许使用这些键名。映射中可以有一对多键,顺序任意,每个键名都通过 qi 验证。
作为示例,语法看起来像这样。
test .|*|<hostname> add|modify|save ( key [value] key [value] ... ) ;
//
test . add ( a1 ex00
a2 ex01
a3 "ex02,ex03,ex04" );
//
test * modify ( m1 ex10
m2 ex11
m3 "ex12,ex13,ex14"
m4 "abc def ghi" );
//
test 10.0.0.1 clear ( c1
c2
c3 );
在此示例中,“添加”的键是 a1、a2 和 a3,同样,“修改”的键是 m1、m2、m3 和 m4,每个键都必须包含一个值。对于“clear”,映射 c1、c2 和 c3 的键可能不包含值。此外,假设对于此示例,您最多可以有 10 个键(a1 ... a11、m1 ... m11 和 c1 ... c11),可以按任何顺序使用它们的任意组合来执行相应的操作。这意味着您不能将已知密钥 cX 用于 "add" 或将 mX 用于 "clear"
结构遵循这个简单的模式
//
struct test
{
std::string host;
std::string action;
std::map<std::string,std::string> option;
}
所以根据上面的例子,我希望结构包含...
// add ...
test.host = .
test.action = add
test.option[0].first = a1
test.option[0].second = ex00
test.option[1].first = a2
test.option[1].second = ex01
test.option[2].first = a3
test.option[2].second = ex02,ex03,ex04
// modify ...
test.host = *
test.action = modify
test.option[0].first = m1
test.option[0].second = ex10
test.option[1].first = m2
test.option[1].second = ex11
test.option[2].first = m3
test.option[2].second = ex12,ex13,ex14
test.option[2].first = m3
test.option[2].second = abc def ghi
// clear ...
test.host = *
test.action = 10.0.0.1
test.option[0].first = c1
test.option[0].second =
test.option[1].first = c2
test.option[1].second =
test.option[2].first = c3
test.option[2].second =
我可以让每个单独的部分独立工作,但在他们看来我无法一起工作。例如,我的主机和动作在没有地图<>的情况下工作。
我改编了之前发布的来自 Sehe (here) trying to get this to work (BTW: Sehe 的示例,其中有一些很棒的示例,我一直在使用它们,就像文档一样。
这是一段摘录(显然没有用),但至少显示了我想去的地方。
namespace ast {
namespace qi = boost::spirit::qi;
//
using unused = qi::unused_type;
//
using string = std::string;
using strings = std::vector<string>;
using list = strings;
using pair = std::pair<string, string>;
using map = std::map<string, string>;
//
struct test
{
using preference = std::map<string,string>;
string host;
string action;
preference option;
};
}
//
BOOST_FUSION_ADAPT_STRUCT( ast::test,
( std::string, host )
( std::string, action ) )
( ast::test::preference, option ) )
//
namespace grammar
{
//
template <typename It>
struct parser
{
//
struct skip : qi::grammar<It>
{
//
skip() : skip::base_type( text )
{
using namespace qi;
// handle all whitespace (" ", \t, ...)
// along with comment lines/blocks
//
// comment blocks: /* ... */
// // ...
// -- ...
// # ...
text = ascii::space
| ( "#" >> *( char_ - eol ) >> ( eoi | eol ) ) // line comment
| ( "--" >> *( char_ - eol ) >> ( eoi | eol ) ) // ...
| ( "//" >> *( char_ - eol ) >> ( eoi | eol ) ) // ...
| ( "/*" >> *( char_ - "*/" ) >> "*/" ); // block comment
//
BOOST_SPIRIT_DEBUG_NODES( ( text ) )
}
//
qi::rule<It> text;
};
//
struct token
{
//
token()
{
using namespace qi;
// common
string = '"' >> *("\" >> char_ | ~char_('"')) >> '"';
identity = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
real = double_;
integer = int_;
//
value = ( string | identity );
// ip target
any = '*';
local = ( char_('.') | fqdn );
fqdn = +char_("a-zA-Z0-9.\-" ); // consession
ipv4 = +as_string[ octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
>> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
>> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.'
>> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] ];
//
target = ( any | local | fqdn | ipv4 );
//
pair = identity >> -( attr( ' ' ) >> value );
map = pair >> *( attr( ' ' ) >> pair );
list = *( value );
//
BOOST_SPIRIT_DEBUG_NODES( ( string )
( identity )
( value )
( real )
( integer )
( any )
( local )
( fqdn )
( ipv4 )
( target )
( pair )
( keyval )
( map )
( list ) )
}
//
qi::rule<It, std::string()> string;
qi::rule<It, std::string()> identity;
qi::rule<It, std::string()> value;
qi::rule<It, double()> real;
qi::rule<It, int()> integer;
qi::uint_parser<unsigned, 10, 1, 3> octet;
qi::rule<It, std::string()> any;
qi::rule<It, std::string()> local;
qi::rule<It, std::string()> fqdn;
qi::rule<It, std::string()> ipv4;
qi::rule<It, std::string()> target;
//
qi::rule<It, ast::map()> map;
qi::rule<It, ast::pair()> pair;
qi::rule<It, ast::pair()> keyval;
qi::rule<It, ast::list()> list;
};
//
struct test : token, qi::grammar<It, ast::test(), skip>
{
//
test() : test::base_type( command_ )
{
using namespace qi;
using namespace qr;
auto kw = qr::distinct( copy( char_( "a-zA-Z0-9_" ) ) );
// not sure how to enforce the "key" names!
key_ = *( '(' >> *value >> ')' );
// tried using token::map ... didn't work ...
//
add_ = ( ( "add" >> attr( ' ' ) ) [ _val = "add" ] );
modify_ = ( ( "modify" >> attr( ' ' ) ) [ _val = "modify" ] );
clear_ = ( ( "clear" >> attr( ' ' ) ) [ _val = "clear" ] );
//
action_ = ( add_ | modify_ | clear_ );
/* *** can't get from A to B here ... not sure what to do *** */
//
command_ = kw[ "test" ]
>> target
>> action_
>> ';';
BOOST_SPIRIT_DEBUG_NODES( ( command_ )
( action_ )
( add_ )
( modify_ )
( clear_ ) )
}
//
private:
//
using token::value;
using token::target;
using token::map;
qi::rule<It, ast::test(), skip> command_;
qi::rule<It, std::string(), skip> action_;
//
qi::rule<It, std::string(), skip> add_;
qi::rule<It, std::string(), skip> modify_;
qi::rule<It, std::string(), skip> clear_;
};
...
};
}
我希望这个问题不会太含糊,如果您需要该问题的工作示例,我当然可以提供。非常感谢您提供的任何帮助,在此先感谢您!
备注:
有了这个
add_ = ( ( "add" >> attr( ' ' ) ) [ _val = "add" ] ); modify_ = ( ( "modify" >> attr( ' ' ) ) [ _val = "modify" ] ); clear_ = ( ( "clear" >> attr( ' ' ) ) [ _val = "clear" ] );
您的意思是要求 space 吗?或者你真的只是试图强制 struct
action
字段包含尾随 space (这就是会发生的事情)。如果您指的是后者,我会在解析器之外执行此操作¹。
如果您想要第一个,请使用
kw
工具:add_ = kw["add"] [ _val = "add" ]; modify_ = kw["modify"] [ _val = "modify" ]; clear_ = kw["clear"] [ _val = "clear" ];
事实上,您可以简化它(再次,¹):
add_ = raw[ kw["add"] ]; modify_ = raw[ kw["modify"] ]; clear_ = raw[ kw["clear"] ];
这也意味着你可以简化为
action_ = raw[ kw[lit("add")|"modify"|"clear"] ];
但是,为了更接近您的问题,您还可以使用 symbol parser:
symbols<char> action_sym; action_sym += "add", "modify", "clear"; // action_ = raw[ kw[action_sym] ];
Caveat: the symbols needs to be a member so its lifetime extends beyond the constructor.
如果您打算使用
捕获 ipv4 地址的输入表示ipv4 = +as_string[ octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.' >> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.' >> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] >> '.' >> octet[ _pass = ( _1 >= 0 && _1 <= 255 ) ] ];
Side note I'm assuming
+as_string
is a simple mistake and you meantas_string
instead.简化:
qi::uint_parser<uint8_t, 10, 1, 3> octet;
这避免了范围检查(再次参见 ¹):
ipv4 = as_string[ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
但是,这将构建地址的 4 字符二进制字符串表示形式。如果你想要那个,很好。我对此表示怀疑(因为你会写
std::array<uint8_t, 4>
或uint64_t
,对吧?)。所以如果你想要字符串,再次使用raw[]
:ipv4 = raw[ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
与数字 1 相同的问题:
pair = identity >> -( attr(' ') >> value );
这一次,问题暴露了作品不应该在
token
;从概念上讲,token
-izing 在解析之前进行,因此我会保留标记 skipper-less。kw
在这种情况下并没有真正发挥多大作用。相反,我会将pair
、map
和list
(未使用?)移动到解析器中:pair = kw[identity] >> -value; map = +pair; list = *value;
一些例子
我最近做了一个关于使用 symbols
解析 (
它 远远 超出了您的解析器的范围,因为它在语法中执行各种操作,但 显示的内容是要有可以用特定 "symbol set" 参数化的通用 "lookup-ish" 规则:请参阅答案的
Identifier Lookup
We store "symbol tables" in
Domain
members_variables
and_functions
:using Domain = qi::symbols<char>; Domain _variables, _functions;
Then we declare some rules that can do lookups on either of them:
// domain identifier lookups qi::_r1_type _domain; qi::rule<It, Ast::Identifier(Domain const&)> maybe_known, known,
unknown;
The corresponding declarations will be shown shortly.
Variables are pretty simple:
variable = maybe_known(phx::ref(_variables));
Calls are trickier. If a name is unknown we don't want to assume it implies a function unless it's followed by a
'('
character. However, if an identifier is a known function name, we want even to imply the(
(this gives the UX the appearance of autocompletion where when the user typessqrt
, it suggests the next character to be(
magically).// The heuristics: // - an unknown identifier followed by ( // - an unclosed argument list implies ) call %= (
known(phx::ref(_functions)) // known -> imply the parens | &(identifier >> '(') >> unknown(phx::ref(_functions)) ) >> implied('(') >> -(expression % ',') >> implied(')');
It all builds on
known
,unknown
andmaybe_known
://///////////////////////////// // identifier loopkup, suggesting { maybe_known = known(_domain) | unknown(_domain); // distinct to avoid partially-matching identifiers using boost::spirit::repository::qi::distinct; auto kw = distinct(copy(alnum | '_')); known = raw[kw[lazy(_domain)]]; unknown = raw[identifier[_val=_1]] [suggest_for(_1, _domain)]; }
我认为您可以在这里建设性地使用相同的方法。另一个技巧可能是验证所提供的属性实际上是唯一的。
演示作品
结合上面的所有提示使其编译和"parse"测试命令:
#include <string>
#include <map>
#include <vector>
namespace ast {
//
using string = std::string;
using strings = std::vector<string>;
using list = strings;
using pair = std::pair<string, string>;
using map = std::map<string, string>;
//
struct command {
string host;
string action;
map option;
};
}
#include <boost/fusion/adapted.hpp>
BOOST_FUSION_ADAPT_STRUCT(ast::command, host, action, option)
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/repository/include/qi_distinct.hpp>
namespace grammar
{
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct parser
{
struct skip : qi::grammar<It> {
skip() : skip::base_type(text) {
using namespace qi;
// handle all whitespace along with line/block comments
text = ascii::space
| (lit("#")|"--"|"//") >> *(char_ - eol) >> (eoi | eol) // line comment
| "/*" >> *(char_ - "*/") >> "*/"; // block comment
//
BOOST_SPIRIT_DEBUG_NODES((text))
}
private:
qi::rule<It> text;
};
//
struct token {
//
token() {
using namespace qi;
// common
string = '"' >> *("\" >> char_ | ~char_('"')) >> '"';
identity = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
value = string | identity;
// ip target
any = '*';
local = '.' | fqdn;
fqdn = +char_("a-zA-Z0-9.\-"); // concession
ipv4 = raw [ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
//
target = any | local | fqdn | ipv4;
//
BOOST_SPIRIT_DEBUG_NODES(
(string) (identity) (value)
(any) (local) (fqdn) (ipv4) (target)
)
}
protected:
//
qi::rule<It, std::string()> string;
qi::rule<It, std::string()> identity;
qi::rule<It, std::string()> value;
qi::uint_parser<uint8_t, 10, 1, 3> octet;
qi::rule<It, std::string()> any;
qi::rule<It, std::string()> local;
qi::rule<It, std::string()> fqdn;
qi::rule<It, std::string()> ipv4;
qi::rule<It, std::string()> target;
};
//
struct test : token, qi::grammar<It, ast::command(), skip> {
//
test() : test::base_type(command_)
{
using namespace qi;
auto kw = qr::distinct( copy( char_( "a-zA-Z0-9_" ) ) );
//
action_sym += "add", "modify", "clear";
action_ = raw[ kw[action_sym] ];
//
command_ = kw["test"]
>> target
>> action_
>> '(' >> map >> ')'
>> ';';
//
pair = kw[identity] >> -value;
map = +pair;
list = *value;
BOOST_SPIRIT_DEBUG_NODES(
(command_) (action_)
(pair) (map) (list)
)
}
private:
using token::target;
using token::identity;
using token::value;
qi::symbols<char> action_sym;
//
qi::rule<It, ast::command(), skip> command_;
qi::rule<It, std::string(), skip> action_;
//
qi::rule<It, ast::map(), skip> map;
qi::rule<It, ast::pair(), skip> pair;
qi::rule<It, ast::list(), skip> list;
};
};
}
#include <fstream>
int main() {
using It = boost::spirit::istream_iterator;
using Parser = grammar::parser<It>;
std::ifstream input("input.txt");
It f(input >> std::noskipws), l;
Parser::skip const s{};
Parser::test const p{};
std::vector<ast::command> data;
bool ok = phrase_parse(f, l, *p, s, data);
if (ok) {
std::cout << "Parsed " << data.size() << " commands\n";
} else {
std::cout << "Parsed failed\n";
}
if (f != l) {
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
}
打印
Parsed 3 commands
让我们限制密钥
就像上面的链接答案一样,让我们通过 map
、pair
规则设置实际的键以从以下位置获取允许的值:
using KeySet = qi::symbols<char>;
using KeyRef = KeySet const*;
//
KeySet add_keys, modify_keys, clear_keys;
qi::symbols<char, KeyRef> action_sym;
qi::rule<It, ast::pair(KeyRef), skip> pair;
qi::rule<It, ast::map(KeyRef), skip> map;
Note A key feature used is the associated attribute value with a
symbols<>
lookup (in this case we associate aKeyRef
with an action symbol):
//
add_keys += "a1", "a2", "a3", "a4", "a5", "a6";
modify_keys += "m1", "m2", "m3", "m4";
clear_keys += "c1", "c2", "c3", "c4", "c5";
action_sym.add
("add", &add_keys)
("modify", &modify_keys)
("clear", &clear_keys);
现在开始繁重的工作。
使用 qi::locals<>
和 继承属性
让我们给 command_
一些本地 space 来存储选定的键集:
qi::rule<It, ast::command(), skip, qi::locals<KeyRef> > command_;
现在我们原则上可以分配给它(使用 _a
占位符)。但是,有一些细节:
//
qi::_a_type selected;
总是喜欢描述性的名字 :) _a
和 _r1
很快就变老了。事情已经够混乱了。
command_ %= kw["test"]
>> target
>> raw[ kw[action_sym] [ selected = _1 ] ]
>> '(' >> map(selected) >> ')'
>> ';';
Note: the subtlest detail here is
%=
instead of=
to avoid the suppression of automatic attribute propagation when a semantic action is present (yeah, see ¹ again...)
但总而言之,读起来还不错吧?
//
qi::_r1_type symref;
pair = raw[ kw[lazy(*symref)] ] >> -value;
map = +pair(symref);
现在至少东西解析了
快到了
//#define BOOST_SPIRIT_DEBUG
#include <string>
#include <map>
#include <vector>
namespace ast {
//
using string = std::string;
using strings = std::vector<string>;
using list = strings;
using pair = std::pair<string, string>;
using map = std::map<string, string>;
//
struct command {
string host;
string action;
map option;
};
}
#include <boost/fusion/adapted.hpp>
BOOST_FUSION_ADAPT_STRUCT(ast::command, host, action, option)
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/repository/include/qi_distinct.hpp>
namespace grammar
{
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct parser
{
struct skip : qi::grammar<It> {
skip() : skip::base_type(rule_) {
using namespace qi;
// handle all whitespace along with line/block comments
rule_ = ascii::space
| (lit("#")|"--"|"//") >> *(char_ - eol) >> (eoi | eol) // line comment
| "/*" >> *(char_ - "*/") >> "*/"; // block comment
//
//BOOST_SPIRIT_DEBUG_NODES((skipper))
}
private:
qi::rule<It> rule_;
};
//
struct token {
//
token() {
using namespace qi;
// common
string = '"' >> *("\" >> char_ | ~char_('"')) >> '"';
identity = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
value = string | identity;
// ip target
any = '*';
local = '.' | fqdn;
fqdn = +char_("a-zA-Z0-9.\-"); // concession
ipv4 = raw [ octet >> '.' >> octet >> '.' >> octet >> '.' >> octet ];
//
target = any | local | fqdn | ipv4;
//
BOOST_SPIRIT_DEBUG_NODES(
(string) (identity) (value)
(any) (local) (fqdn) (ipv4) (target)
)
}
protected:
//
qi::rule<It, std::string()> string;
qi::rule<It, std::string()> identity;
qi::rule<It, std::string()> value;
qi::uint_parser<uint8_t, 10, 1, 3> octet;
qi::rule<It, std::string()> any;
qi::rule<It, std::string()> local;
qi::rule<It, std::string()> fqdn;
qi::rule<It, std::string()> ipv4;
qi::rule<It, std::string()> target;
};
//
struct test : token, qi::grammar<It, ast::command(), skip> {
//
test() : test::base_type(start_)
{
using namespace qi;
auto kw = qr::distinct( copy( char_( "a-zA-Z0-9_" ) ) );
//
add_keys += "a1", "a2", "a3", "a4", "a5", "a6";
modify_keys += "m1", "m2", "m3", "m4";
clear_keys += "c1", "c2", "c3", "c4", "c5";
action_sym.add
("add", &add_keys)
("modify", &modify_keys)
("clear", &clear_keys);
//
qi::_a_type selected;
command_ %= kw["test"]
>> target
>> raw[ kw[action_sym] [ selected = _1 ] ]
>> '(' >> map(selected) >> ')'
>> ';';
//
qi::_r1_type symref;
pair = raw[ kw[lazy(*symref)] ] >> -value;
map = +pair(symref);
list = *value;
start_ = command_;
BOOST_SPIRIT_DEBUG_NODES(
(start_) (command_)
(pair) (map) (list)
)
}
private:
using token::target;
using token::identity;
using token::value;
using KeySet = qi::symbols<char>;
using KeyRef = KeySet const*;
//
qi::rule<It, ast::command(), skip> start_;
qi::rule<It, ast::command(), skip, qi::locals<KeyRef> > command_;
//
KeySet add_keys, modify_keys, clear_keys;
qi::symbols<char, KeyRef> action_sym;
qi::rule<It, ast::pair(KeyRef), skip> pair;
qi::rule<It, ast::map(KeyRef), skip> map;
qi::rule<It, ast::list(), skip> list;
};
};
}
#include <fstream>
int main() {
using It = boost::spirit::istream_iterator;
using Parser = grammar::parser<It>;
std::ifstream input("input.txt");
It f(input >> std::noskipws), l;
Parser::skip const s{};
Parser::test const p{};
std::vector<ast::command> data;
bool ok = phrase_parse(f, l, *p, s, data);
if (ok) {
std::cout << "Parsed " << data.size() << " commands\n";
} else {
std::cout << "Parsed failed\n";
}
if (f != l) {
std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
}
打印
Parsed 3 commands
稍等,别那么快!错了
是的。如果你启用调试,你会看到它解析的东西很奇怪:
<attributes>[[[1, 0, ., 0, ., 0, ., 1], [c, l, e, a, r], [[[c, 1], [c, 2]], [[c, 3], []]]]]</attributes>
这实际上是"merely"语法问题。如果语法看不出 key
和 value
之间的区别,那么显然 c2
将被解析为 [=222= 的 value ] 键 c1
.
由您来消除语法歧义。现在,我将使用 否定断言 来演示修复:我们只接受不是 已知键 的值。它有点脏,但可能对您的教学有用:
key = raw[ kw[lazy(*symref)] ];
pair = key(symref) >> -(!key(symref) >> value);
map = +pair(symref);
请注意,为了可读性,我提取了 key
规则:
解析
<attributes>[[[1, 0, ., 0, ., 0, ., 1], [c, l, e, a, r], [[[c, 1], []], [[c, 2], []], [[c, 3], []]]]]</attributes>
正是医生所嘱咐的!
¹ Boost Spirit: "Semantic actions are evil"?