使用 Boost Spirit/Fusion 轻松解析带有枚举字段和 STL 容器的结构
Parsing struct with enum fields and STL containers easily using Boost Spirit/Fusion
boost 新手,我实际上需要 boost 精神来编写一个简单的解析器来填充一些数据结构。
它们大致如下:
struct Task
{
const string dataname;
const Level level;
const string aggregator;
const set<string> groupby;
void operator();
};
struct Schedule
{
map<Level, ComputeTask> tasks;
// I have left just to make it seems that
// the struct wrapping over the map is not
// useless (this is not the full code)
void operator()(const InstancePtr &node);
};
关于 Task
,我不知道如何使用 employee example 中提到的 BOOST_FUSION_ADAPT_STRUCT
或变体,使其与枚举和 STL 容器一起工作字段。
与Schedule
类似的问题,但这次我也使用了用户类型(可能已经注册到融合,它是递归的吗?)。
我正在设计文件格式,结构定义和文件格式可能会改变,所以我更喜欢使用 boost 而不是手工制作但难以维护的代码。我这样做也是为了学习。
文件可能如下所示:
level: level operation name on(data1, data2, data3)
level: level operation name on()
level: level operation name on(data1, data2)
一行是map
在Schedule
中的一个入口,:
前面是key,其余的定义了Task
。
其中 level
被替换为对应于 enum Level
的一些级别关键字,类似 operation
的情况, name
是允许的名称之一(在一组关键字中),on()
是关键字,括号内是用户提供的零个或多个字符串,应填充 Task
中的 set<string> groupby
字段。
我希望它是可读的,我什至可以添加英语关键字,这除了可读性外不会增加任何其他内容,这是使用一些解析库而不是手工编写代码的另一个原因。
如果您认为我的问题不够清楚,请随时询问更多细节..
谢谢。
因此,根据您的示例做出一些假设并不能使意思非常清楚。但是这里是:
使用随机枚举:
enum class Level { One, Two, Three, LEVEL };
Sidenote: the std::set<>
might need to be a sequential container, because usually groupby
operations are not commutative (the order matters). I don't know about your domain, of course,
正在适应:
BOOST_FUSION_ADAPT_STRUCT(ComputeTask, level, aggregator, dataname, groupby)
BOOST_FUSION_ADAPT_STRUCT(Schedule, tasks)
请注意,我巧妙地将适应的字段按语法顺序排列。这对以后有很大帮助。
想到的最简单的语法:
template <typename It>
struct Parser : qi::grammar<It, Schedule()> {
Parser() : Parser::base_type(_start) {
using namespace qi;
_any_word = lexeme [ +char_("a-zA-Z0-9-_./") ];
_operation = _any_word; // TODO
_group_field = _any_word; // TODO
_dataname = _any_word; // TODO
_level = no_case [ _level_sym ];
_groupby = '(' >> -(_group_field % ',') >> ')';
_task = _level >> _operation >> _dataname >> "on" >> _groupby;
_entry = _level >> ':' >> _task;
_schedule = _entry % eol;
_start = skip(blank) [ _schedule ];
BOOST_SPIRIT_DEBUG_NODES((_start)(_schedule)(_task)(_groupby)(_level)(_operation)(_dataname)(_group_field))
}
private:
struct level_sym : qi::symbols<char, Level> {
level_sym() { this->add
("one", Level::One)
("two", Level::Two)
("three", Level::Three)
("level", Level::LEVEL);
}
} _level_sym;
// lexemes
qi::rule<It, std::string()> _any_word;
qi::rule<It, std::string()> _operation, _dataname, _group_field; // TODO
qi::rule<It, Level()> _level;
using Skipper = qi::blank_type;
using Table = decltype(Schedule::tasks);
using Entry = std::pair<Level, ComputeTask>;
qi::rule<It, std::set<std::string>(), Skipper> _groupby;
qi::rule<It, ComputeTask(), Skipper> _task;
qi::rule<It, Entry(), Skipper> _entry;
qi::rule<It, Table(), Skipper> _schedule;
qi::rule<It, Schedule()> _start;
};
我将输入更改为在计划中具有 Level
的唯一键,否则实际上只会产生一个条目。
int main() {
Parser<std::string::const_iterator> const parser;
for (std::string const input : { R"(ONE: level operation name on(data1, data2, data3)
TWO: level operation name on()
THREE: level operation name on(data1, data2))" })
{
auto f = begin(input), l = end(input);
Schedule s;
if (parse(f, l, parser, s)) {
std::cout << "Parsed\n";
for (auto& [level, task] : s.tasks) {
std::cout << level << ": " << task << "\n";
}
} else {
std::cout << "Failed\n";
}
if (f != l) {
std::cout << "Remaining unparsed input: " << std::quoted(std::string(f,l)) << "\n";
}
}
}
版画
Parsed
One: LEVEL operation name on (data1, data2, data3)
Two: LEVEL operation name on ()
Three: LEVEL operation name on (data1, data2)
另外,BOOST_SPIRIT_DEBUG
定义:
<_start>
<try>ONE: level operation</try>
<_schedule>
<try>ONE: level operation</try>
<_level>
<try>ONE: level operation</try>
<success>: level operation na</success>
<attributes>[One]</attributes>
</_level>
<_task>
<try> level operation nam</try>
<_level>
<try>level operation name</try>
<success> operation name on(d</success>
<attributes>[LEVEL]</attributes>
</_level>
<_operation>
<try>operation name on(da</try>
<success> name on(data1, data</success>
<attributes>[[o, p, e, r, a, t, i, o, n]]</attributes>
</_operation>
<_dataname>
<try>name on(data1, data2</try>
<success> on(data1, data2, da</success>
<attributes>[[n, a, m, e]]</attributes>
</_dataname>
<_groupby>
<try>(data1, data2, data3</try>
<_group_field>
<try>data1, data2, data3)</try>
<success>, data2, data3)\nTWO:</success>
<attributes>[[d, a, t, a, 1]]</attributes>
</_group_field>
<_group_field>
<try>data2, data3)\nTWO: l</try>
<success>, data3)\nTWO: level </success>
<attributes>[[d, a, t, a, 2]]</attributes>
</_group_field>
<_group_field>
<try>data3)\nTWO: level op</try>
<success>)\nTWO: level operati</success>
<attributes>[[d, a, t, a, 3]]</attributes>
</_group_field>
<success>\nTWO: level operatio</success>
<attributes>[[[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]</attributes>
</_groupby>
<success>\nTWO: level operatio</success>
<attributes>[[LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]]</attributes>
</_task>
<_level>
<try>TWO: level operation</try>
<success>: level operation na</success>
<attributes>[Two]</attributes>
</_level>
<_task>
<try> level operation nam</try>
<_level>
<try>level operation name</try>
<success> operation name on()</success>
<attributes>[LEVEL]</attributes>
</_level>
<_operation>
<try>operation name on()\n</try>
<success> name on()\nTHREE: le</success>
<attributes>[[o, p, e, r, a, t, i, o, n]]</attributes>
</_operation>
<_dataname>
<try>name on()\nTHREE: lev</try>
<success> on()\nTHREE: level o</success>
<attributes>[[n, a, m, e]]</attributes>
</_dataname>
<_groupby>
<try>()\nTHREE: level oper</try>
<_group_field>
<try>)\nTHREE: level opera</try>
<fail/>
</_group_field>
<success>\nTHREE: level operat</success>
<attributes>[[]]</attributes>
</_groupby>
<success>\nTHREE: level operat</success>
<attributes>[[LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], []]]</attributes>
</_task>
<_level>
<try>THREE: level operati</try>
<success>: level operation na</success>
<attributes>[Three]</attributes>
</_level>
<_task>
<try> level operation nam</try>
<_level>
<try>level operation name</try>
<success> operation name on(d</success>
<attributes>[LEVEL]</attributes>
</_level>
<_operation>
<try>operation name on(da</try>
<success> name on(data1, data</success>
<attributes>[[o, p, e, r, a, t, i, o, n]]</attributes>
</_operation>
<_dataname>
<try>name on(data1, data2</try>
<success> on(data1, data2)</success>
<attributes>[[n, a, m, e]]</attributes>
</_dataname>
<_groupby>
<try>(data1, data2)</try>
<_group_field>
<try>data1, data2)</try>
<success>, data2)</success>
<attributes>[[d, a, t, a, 1]]</attributes>
</_group_field>
<_group_field>
<try>data2)</try>
<success>)</success>
<attributes>[[d, a, t, a, 2]]</attributes>
</_group_field>
<success></success>
<attributes>[[[d, a, t, a, 1], [d, a, t, a, 2]]]</attributes>
</_groupby>
<success></success>
<attributes>[[LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2]]]]</attributes>
</_task>
<success></success>
<attributes>[[[One, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]], [Two, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], []]], [Three, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2]]]]]]</attributes>
</_schedule>
<success></success>
<attributes>[[[[One, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]], [Two, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], []]], [Three, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2]]]]]]]</attributes>
</_start>
完整列表
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted.hpp>
#include <vector>
#include <map>
#include <set>
#include <iostream>
#include <iomanip>
#include <experimental/iterator>
enum class Level { One, Two, Three, LEVEL };
struct ComputeTask {
std::string dataname;
Level level;
std::string aggregator;
std::set<std::string> groupby;
};
struct Schedule {
std::map<Level, ComputeTask> tasks;
};
//////////////////////
// FOR DEBUG DEMO ONLY
static inline std::ostream& operator<<(std::ostream& os, Level l) {
switch(l) {
case Level::One: return os << "One";
case Level::Two: return os << "Two";
case Level::Three: return os << "Three";
case Level::LEVEL: return os << "LEVEL";
}
return os << "?";
}
static inline std::ostream& operator<<(std::ostream& os, ComputeTask const& task) {
os << task.level << ' ' << task.aggregator << ' ' << task.dataname << " on (";
copy(begin(task.groupby), end(task.groupby), std::experimental::make_ostream_joiner(os, ", "));
return os << ')';
}
/////////////
// FOR PARSER
BOOST_FUSION_ADAPT_STRUCT(ComputeTask, level, aggregator, dataname, groupby)
BOOST_FUSION_ADAPT_STRUCT(Schedule, tasks)
namespace qi = boost::spirit::qi;
template <typename It>
struct Parser : qi::grammar<It, Schedule()> {
Parser() : Parser::base_type(_start) {
using namespace qi;
_any_word = lexeme [ +char_("a-zA-Z0-9-_./") ];
_operation = _any_word; // TODO
_group_field = _any_word; // TODO
_dataname = _any_word; // TODO
_level = no_case [ _level_sym ];
_groupby = '(' >> -(_group_field % ',') >> ')';
_task = _level >> _operation >> _dataname >> "on" >> _groupby;
_entry = _level >> ':' >> _task;
_schedule = _entry % eol;
_start = skip(blank) [ _schedule ];
BOOST_SPIRIT_DEBUG_NODES((_start)(_schedule)(_task)(_groupby)(_level)(_operation)(_dataname)(_group_field))
}
private:
struct level_sym : qi::symbols<char, Level> {
level_sym() { this->add
("one", Level::One)
("two", Level::Two)
("three", Level::Three)
("level", Level::LEVEL);
}
} _level_sym;
// lexemes
qi::rule<It, std::string()> _any_word;
qi::rule<It, std::string()> _operation, _dataname, _group_field; // TODO
qi::rule<It, Level()> _level;
using Skipper = qi::blank_type;
using Table = decltype(Schedule::tasks);
using Entry = std::pair<Level, ComputeTask>;
qi::rule<It, std::set<std::string>(), Skipper> _groupby;
qi::rule<It, ComputeTask(), Skipper> _task;
qi::rule<It, Entry(), Skipper> _entry;
qi::rule<It, Table(), Skipper> _schedule;
qi::rule<It, Schedule()> _start;
};
int main() {
Parser<std::string::const_iterator> const parser;
for (std::string const input : { R"(ONE: level operation name on(data1, data2, data3)
TWO: level operation name on()
THREE: level operation name on(data1, data2))" })
{
auto f = begin(input), l = end(input);
Schedule s;
if (parse(f, l, parser, s)) {
std::cout << "Parsed\n";
for (auto& [level, task] : s.tasks) {
std::cout << level << ": " << task << "\n";
}
} else {
std::cout << "Failed\n";
}
if (f != l) {
std::cout << "Remaining unparsed input: " << std::quoted(std::string(f,l)) << "\n";
}
}
}
我会推荐用户@sehe 的解决方案。这非常灵活。
但我也想分享纯C++的解决方案。正如我在上面的评论中所写的那样,您的输入语言非常简单。您甚至可以使用标准提取器运算符读取第一个元素。其余部分可以使用 std::istream:iterator.
循环读取
您也可以使用 C++ std::regex 来验证输入。因为您的语言是 Chomsky-Type-3 常规语言,所以这很容易实现。而如果输入的字符串有效,则可以使用std::regex个元素和std::regex_token_iterator来获取数据。
我为您创建了一个示例。数据打包在一个结构中。对于这个结构,我已经覆盖了插入器和提取器运算符。使用 std::iostream 函数可以轻松输入和输出。
在 main 中,我有一个 one-liner 用于读取完整的输入文件并将数据放入向量中。所以,我用构造函数参数定义变量。就是这样。所有数据都将根据需要提供。出于调试目的,我将结果打印在屏幕上。
作为练习,我将数据放在地图中。
#include <iostream>
#include <string>
#include <vector>
#include <map>
#include <iterator>
#include <regex>
#include <sstream>
std::istringstream testData(
R"#(level1: levelA operation0 name0 on(data10, data12, data13)
level2: levelB operation1 name1 on( data1 )
level3: levelC operation2 name2 on()
level4: levelD operation3 name3 on(data2, data3)
level5: levelE operation4 name4 on(data4, data5, data6, data7)
level6: levelF operation5 name5 on(data8, data9)
)#");
const std::regex InputFileRegEx(R"#((\w+)(?:[\:\s]+)(\w+)(?:\s+)(\w+)(?:\s+)(\w+)(?:\s+)(?:on\s*\()(.*)(?:\)))#");
struct Data
{ // Our Data
std::string levelLeft{}; // Left Element for Map
struct Right{ // Right element for Map. Sub Struct
std::string levelRight{};
std::string operation{};
std::string name{};
std::vector<std::string> data; // The data in the on( section
} r;
// Overload the extractor operator. With that someting like "Data d;std::cin >> d; " is easiliy possible
friend std::istream& operator >> (std::istream& is, Data& d) {
std::string line; getline(is, line); // Read a complete line
std::smatch sm{}; // Prepare match result values
if (std::regex_match(line, sm, InputFileRegEx)) { // CHeck, if the input string is valid
// Copy all data
d.levelLeft = sm[1]; d.r.levelRight = sm[2]; d.r.operation = sm[3]; d.r.name = sm[4]; std::string str(sm[5]);
str.erase(remove_if(str.begin(), str.end(), isspace), str.end()); std::regex comma(","); d.r.data.clear();
if (str.size()) std::copy(std::sregex_token_iterator(str.begin(), str.end(), comma, -1), std::sregex_token_iterator(), std::back_inserter(d.r.data));
}
else is.setstate(std::ios::failbit);
return is;
}
// Overload inserter operator. Only for debug purposes and for illustration
friend std::ostream& operator << (std::ostream& os, const Data& d) {
// Print normal data members
std::cout << d.levelLeft << " :: " << d.r.levelRight << ' ' << d.r.operation << ' ' << d.r.name << " --> ";
// Print the mebers of the vector
std::copy(d.r.data.begin(), d.r.data.end(), std::ostream_iterator<std::string>(os, " "));std::cout << '\n';
return os;
}
};
using MyMap = std::map<std::string, Data::Right>;
int main()
{
// Read all test data in an array of test data. The one-Liner :-)
std::vector<Data> dataAll{std::istream_iterator<Data>(testData), std::istream_iterator<Data>() };
// For debug purposes. Print to console
std::copy(dataAll.begin(), dataAll.end(), std::ostream_iterator<Data>(std::cout, "\n"));
MyMap myMap{}; // Put all Data in map
for (const Data& d : dataAll) myMap[d.levelLeft] = d.r;
return 0;
}
所以,main 函数很小,其余的也不是真正的大代码。比较简单。
希望这能提供一些见解。
boost 新手,我实际上需要 boost 精神来编写一个简单的解析器来填充一些数据结构。
它们大致如下:
struct Task
{
const string dataname;
const Level level;
const string aggregator;
const set<string> groupby;
void operator();
};
struct Schedule
{
map<Level, ComputeTask> tasks;
// I have left just to make it seems that
// the struct wrapping over the map is not
// useless (this is not the full code)
void operator()(const InstancePtr &node);
};
关于 Task
,我不知道如何使用 employee example 中提到的 BOOST_FUSION_ADAPT_STRUCT
或变体,使其与枚举和 STL 容器一起工作字段。
与Schedule
类似的问题,但这次我也使用了用户类型(可能已经注册到融合,它是递归的吗?)。
我正在设计文件格式,结构定义和文件格式可能会改变,所以我更喜欢使用 boost 而不是手工制作但难以维护的代码。我这样做也是为了学习。
文件可能如下所示:
level: level operation name on(data1, data2, data3)
level: level operation name on()
level: level operation name on(data1, data2)
一行是map
在Schedule
中的一个入口,:
前面是key,其余的定义了Task
。
其中 level
被替换为对应于 enum Level
的一些级别关键字,类似 operation
的情况, name
是允许的名称之一(在一组关键字中),on()
是关键字,括号内是用户提供的零个或多个字符串,应填充 Task
中的 set<string> groupby
字段。
我希望它是可读的,我什至可以添加英语关键字,这除了可读性外不会增加任何其他内容,这是使用一些解析库而不是手工编写代码的另一个原因。
如果您认为我的问题不够清楚,请随时询问更多细节..
谢谢。
因此,根据您的示例做出一些假设并不能使意思非常清楚。但是这里是:
使用随机枚举:
enum class Level { One, Two, Three, LEVEL };
Sidenote: the
std::set<>
might need to be a sequential container, because usuallygroupby
operations are not commutative (the order matters). I don't know about your domain, of course,
正在适应:
BOOST_FUSION_ADAPT_STRUCT(ComputeTask, level, aggregator, dataname, groupby)
BOOST_FUSION_ADAPT_STRUCT(Schedule, tasks)
请注意,我巧妙地将适应的字段按语法顺序排列。这对以后有很大帮助。
想到的最简单的语法:
template <typename It>
struct Parser : qi::grammar<It, Schedule()> {
Parser() : Parser::base_type(_start) {
using namespace qi;
_any_word = lexeme [ +char_("a-zA-Z0-9-_./") ];
_operation = _any_word; // TODO
_group_field = _any_word; // TODO
_dataname = _any_word; // TODO
_level = no_case [ _level_sym ];
_groupby = '(' >> -(_group_field % ',') >> ')';
_task = _level >> _operation >> _dataname >> "on" >> _groupby;
_entry = _level >> ':' >> _task;
_schedule = _entry % eol;
_start = skip(blank) [ _schedule ];
BOOST_SPIRIT_DEBUG_NODES((_start)(_schedule)(_task)(_groupby)(_level)(_operation)(_dataname)(_group_field))
}
private:
struct level_sym : qi::symbols<char, Level> {
level_sym() { this->add
("one", Level::One)
("two", Level::Two)
("three", Level::Three)
("level", Level::LEVEL);
}
} _level_sym;
// lexemes
qi::rule<It, std::string()> _any_word;
qi::rule<It, std::string()> _operation, _dataname, _group_field; // TODO
qi::rule<It, Level()> _level;
using Skipper = qi::blank_type;
using Table = decltype(Schedule::tasks);
using Entry = std::pair<Level, ComputeTask>;
qi::rule<It, std::set<std::string>(), Skipper> _groupby;
qi::rule<It, ComputeTask(), Skipper> _task;
qi::rule<It, Entry(), Skipper> _entry;
qi::rule<It, Table(), Skipper> _schedule;
qi::rule<It, Schedule()> _start;
};
我将输入更改为在计划中具有 Level
的唯一键,否则实际上只会产生一个条目。
int main() {
Parser<std::string::const_iterator> const parser;
for (std::string const input : { R"(ONE: level operation name on(data1, data2, data3)
TWO: level operation name on()
THREE: level operation name on(data1, data2))" })
{
auto f = begin(input), l = end(input);
Schedule s;
if (parse(f, l, parser, s)) {
std::cout << "Parsed\n";
for (auto& [level, task] : s.tasks) {
std::cout << level << ": " << task << "\n";
}
} else {
std::cout << "Failed\n";
}
if (f != l) {
std::cout << "Remaining unparsed input: " << std::quoted(std::string(f,l)) << "\n";
}
}
}
版画
Parsed
One: LEVEL operation name on (data1, data2, data3)
Two: LEVEL operation name on ()
Three: LEVEL operation name on (data1, data2)
另外,BOOST_SPIRIT_DEBUG
定义:
<_start>
<try>ONE: level operation</try>
<_schedule>
<try>ONE: level operation</try>
<_level>
<try>ONE: level operation</try>
<success>: level operation na</success>
<attributes>[One]</attributes>
</_level>
<_task>
<try> level operation nam</try>
<_level>
<try>level operation name</try>
<success> operation name on(d</success>
<attributes>[LEVEL]</attributes>
</_level>
<_operation>
<try>operation name on(da</try>
<success> name on(data1, data</success>
<attributes>[[o, p, e, r, a, t, i, o, n]]</attributes>
</_operation>
<_dataname>
<try>name on(data1, data2</try>
<success> on(data1, data2, da</success>
<attributes>[[n, a, m, e]]</attributes>
</_dataname>
<_groupby>
<try>(data1, data2, data3</try>
<_group_field>
<try>data1, data2, data3)</try>
<success>, data2, data3)\nTWO:</success>
<attributes>[[d, a, t, a, 1]]</attributes>
</_group_field>
<_group_field>
<try>data2, data3)\nTWO: l</try>
<success>, data3)\nTWO: level </success>
<attributes>[[d, a, t, a, 2]]</attributes>
</_group_field>
<_group_field>
<try>data3)\nTWO: level op</try>
<success>)\nTWO: level operati</success>
<attributes>[[d, a, t, a, 3]]</attributes>
</_group_field>
<success>\nTWO: level operatio</success>
<attributes>[[[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]</attributes>
</_groupby>
<success>\nTWO: level operatio</success>
<attributes>[[LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]]</attributes>
</_task>
<_level>
<try>TWO: level operation</try>
<success>: level operation na</success>
<attributes>[Two]</attributes>
</_level>
<_task>
<try> level operation nam</try>
<_level>
<try>level operation name</try>
<success> operation name on()</success>
<attributes>[LEVEL]</attributes>
</_level>
<_operation>
<try>operation name on()\n</try>
<success> name on()\nTHREE: le</success>
<attributes>[[o, p, e, r, a, t, i, o, n]]</attributes>
</_operation>
<_dataname>
<try>name on()\nTHREE: lev</try>
<success> on()\nTHREE: level o</success>
<attributes>[[n, a, m, e]]</attributes>
</_dataname>
<_groupby>
<try>()\nTHREE: level oper</try>
<_group_field>
<try>)\nTHREE: level opera</try>
<fail/>
</_group_field>
<success>\nTHREE: level operat</success>
<attributes>[[]]</attributes>
</_groupby>
<success>\nTHREE: level operat</success>
<attributes>[[LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], []]]</attributes>
</_task>
<_level>
<try>THREE: level operati</try>
<success>: level operation na</success>
<attributes>[Three]</attributes>
</_level>
<_task>
<try> level operation nam</try>
<_level>
<try>level operation name</try>
<success> operation name on(d</success>
<attributes>[LEVEL]</attributes>
</_level>
<_operation>
<try>operation name on(da</try>
<success> name on(data1, data</success>
<attributes>[[o, p, e, r, a, t, i, o, n]]</attributes>
</_operation>
<_dataname>
<try>name on(data1, data2</try>
<success> on(data1, data2)</success>
<attributes>[[n, a, m, e]]</attributes>
</_dataname>
<_groupby>
<try>(data1, data2)</try>
<_group_field>
<try>data1, data2)</try>
<success>, data2)</success>
<attributes>[[d, a, t, a, 1]]</attributes>
</_group_field>
<_group_field>
<try>data2)</try>
<success>)</success>
<attributes>[[d, a, t, a, 2]]</attributes>
</_group_field>
<success></success>
<attributes>[[[d, a, t, a, 1], [d, a, t, a, 2]]]</attributes>
</_groupby>
<success></success>
<attributes>[[LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2]]]]</attributes>
</_task>
<success></success>
<attributes>[[[One, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]], [Two, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], []]], [Three, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2]]]]]]</attributes>
</_schedule>
<success></success>
<attributes>[[[[One, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2], [d, a, t, a, 3]]]], [Two, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], []]], [Three, [LEVEL, [o, p, e, r, a, t, i, o, n], [n, a, m, e], [[d, a, t, a, 1], [d, a, t, a, 2]]]]]]]</attributes>
</_start>
完整列表
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted.hpp>
#include <vector>
#include <map>
#include <set>
#include <iostream>
#include <iomanip>
#include <experimental/iterator>
enum class Level { One, Two, Three, LEVEL };
struct ComputeTask {
std::string dataname;
Level level;
std::string aggregator;
std::set<std::string> groupby;
};
struct Schedule {
std::map<Level, ComputeTask> tasks;
};
//////////////////////
// FOR DEBUG DEMO ONLY
static inline std::ostream& operator<<(std::ostream& os, Level l) {
switch(l) {
case Level::One: return os << "One";
case Level::Two: return os << "Two";
case Level::Three: return os << "Three";
case Level::LEVEL: return os << "LEVEL";
}
return os << "?";
}
static inline std::ostream& operator<<(std::ostream& os, ComputeTask const& task) {
os << task.level << ' ' << task.aggregator << ' ' << task.dataname << " on (";
copy(begin(task.groupby), end(task.groupby), std::experimental::make_ostream_joiner(os, ", "));
return os << ')';
}
/////////////
// FOR PARSER
BOOST_FUSION_ADAPT_STRUCT(ComputeTask, level, aggregator, dataname, groupby)
BOOST_FUSION_ADAPT_STRUCT(Schedule, tasks)
namespace qi = boost::spirit::qi;
template <typename It>
struct Parser : qi::grammar<It, Schedule()> {
Parser() : Parser::base_type(_start) {
using namespace qi;
_any_word = lexeme [ +char_("a-zA-Z0-9-_./") ];
_operation = _any_word; // TODO
_group_field = _any_word; // TODO
_dataname = _any_word; // TODO
_level = no_case [ _level_sym ];
_groupby = '(' >> -(_group_field % ',') >> ')';
_task = _level >> _operation >> _dataname >> "on" >> _groupby;
_entry = _level >> ':' >> _task;
_schedule = _entry % eol;
_start = skip(blank) [ _schedule ];
BOOST_SPIRIT_DEBUG_NODES((_start)(_schedule)(_task)(_groupby)(_level)(_operation)(_dataname)(_group_field))
}
private:
struct level_sym : qi::symbols<char, Level> {
level_sym() { this->add
("one", Level::One)
("two", Level::Two)
("three", Level::Three)
("level", Level::LEVEL);
}
} _level_sym;
// lexemes
qi::rule<It, std::string()> _any_word;
qi::rule<It, std::string()> _operation, _dataname, _group_field; // TODO
qi::rule<It, Level()> _level;
using Skipper = qi::blank_type;
using Table = decltype(Schedule::tasks);
using Entry = std::pair<Level, ComputeTask>;
qi::rule<It, std::set<std::string>(), Skipper> _groupby;
qi::rule<It, ComputeTask(), Skipper> _task;
qi::rule<It, Entry(), Skipper> _entry;
qi::rule<It, Table(), Skipper> _schedule;
qi::rule<It, Schedule()> _start;
};
int main() {
Parser<std::string::const_iterator> const parser;
for (std::string const input : { R"(ONE: level operation name on(data1, data2, data3)
TWO: level operation name on()
THREE: level operation name on(data1, data2))" })
{
auto f = begin(input), l = end(input);
Schedule s;
if (parse(f, l, parser, s)) {
std::cout << "Parsed\n";
for (auto& [level, task] : s.tasks) {
std::cout << level << ": " << task << "\n";
}
} else {
std::cout << "Failed\n";
}
if (f != l) {
std::cout << "Remaining unparsed input: " << std::quoted(std::string(f,l)) << "\n";
}
}
}
我会推荐用户@sehe 的解决方案。这非常灵活。
但我也想分享纯C++的解决方案。正如我在上面的评论中所写的那样,您的输入语言非常简单。您甚至可以使用标准提取器运算符读取第一个元素。其余部分可以使用 std::istream:iterator.
循环读取您也可以使用 C++ std::regex 来验证输入。因为您的语言是 Chomsky-Type-3 常规语言,所以这很容易实现。而如果输入的字符串有效,则可以使用std::regex个元素和std::regex_token_iterator来获取数据。
我为您创建了一个示例。数据打包在一个结构中。对于这个结构,我已经覆盖了插入器和提取器运算符。使用 std::iostream 函数可以轻松输入和输出。
在 main 中,我有一个 one-liner 用于读取完整的输入文件并将数据放入向量中。所以,我用构造函数参数定义变量。就是这样。所有数据都将根据需要提供。出于调试目的,我将结果打印在屏幕上。
作为练习,我将数据放在地图中。
#include <iostream>
#include <string>
#include <vector>
#include <map>
#include <iterator>
#include <regex>
#include <sstream>
std::istringstream testData(
R"#(level1: levelA operation0 name0 on(data10, data12, data13)
level2: levelB operation1 name1 on( data1 )
level3: levelC operation2 name2 on()
level4: levelD operation3 name3 on(data2, data3)
level5: levelE operation4 name4 on(data4, data5, data6, data7)
level6: levelF operation5 name5 on(data8, data9)
)#");
const std::regex InputFileRegEx(R"#((\w+)(?:[\:\s]+)(\w+)(?:\s+)(\w+)(?:\s+)(\w+)(?:\s+)(?:on\s*\()(.*)(?:\)))#");
struct Data
{ // Our Data
std::string levelLeft{}; // Left Element for Map
struct Right{ // Right element for Map. Sub Struct
std::string levelRight{};
std::string operation{};
std::string name{};
std::vector<std::string> data; // The data in the on( section
} r;
// Overload the extractor operator. With that someting like "Data d;std::cin >> d; " is easiliy possible
friend std::istream& operator >> (std::istream& is, Data& d) {
std::string line; getline(is, line); // Read a complete line
std::smatch sm{}; // Prepare match result values
if (std::regex_match(line, sm, InputFileRegEx)) { // CHeck, if the input string is valid
// Copy all data
d.levelLeft = sm[1]; d.r.levelRight = sm[2]; d.r.operation = sm[3]; d.r.name = sm[4]; std::string str(sm[5]);
str.erase(remove_if(str.begin(), str.end(), isspace), str.end()); std::regex comma(","); d.r.data.clear();
if (str.size()) std::copy(std::sregex_token_iterator(str.begin(), str.end(), comma, -1), std::sregex_token_iterator(), std::back_inserter(d.r.data));
}
else is.setstate(std::ios::failbit);
return is;
}
// Overload inserter operator. Only for debug purposes and for illustration
friend std::ostream& operator << (std::ostream& os, const Data& d) {
// Print normal data members
std::cout << d.levelLeft << " :: " << d.r.levelRight << ' ' << d.r.operation << ' ' << d.r.name << " --> ";
// Print the mebers of the vector
std::copy(d.r.data.begin(), d.r.data.end(), std::ostream_iterator<std::string>(os, " "));std::cout << '\n';
return os;
}
};
using MyMap = std::map<std::string, Data::Right>;
int main()
{
// Read all test data in an array of test data. The one-Liner :-)
std::vector<Data> dataAll{std::istream_iterator<Data>(testData), std::istream_iterator<Data>() };
// For debug purposes. Print to console
std::copy(dataAll.begin(), dataAll.end(), std::ostream_iterator<Data>(std::cout, "\n"));
MyMap myMap{}; // Put all Data in map
for (const Data& d : dataAll) myMap[d.levelLeft] = d.r;
return 0;
}
所以,main 函数很小,其余的也不是真正的大代码。比较简单。
希望这能提供一些见解。