用灵气解析成std::vector<string>,出现段错误或断言失败
parsing into std::vector<string> with Spirit Qi, getting segfaults or assert failures
我正在使用灵气作为我的解析器,将数学表达式解析为表达式树。我跟踪诸如符号类型之类的事情,这些符号在我解析时遇到,并且必须在我正在解析的文本中声明。即,我正在解析 Bertini input files, a simple-ish example of which is here, a complicated example is here,为了完整起见,如下所示:
%input: our first input file
variable_group x,y;
function f,g;
f = x^2 - 1;
g = y^2 - 4;
END;
理想情况下,我一直在研究的语法
- 找到声明语句,然后解析如下comma-separated被声明类型的符号列表,并将得到的符号向量存入被解析的classobject ;例如
variable_group x, y;
- 找到一个先前声明的符号,后面跟一个等号,并且是该符号作为可评估数学的定义object;例如
f = x^2 - 1;
这部分我基本上都在掌控之中。
- 找到一个 not-previously 声明的符号后跟
=
,并将其作为子函数解析。我觉得我也能搞定。
我一直在努力解决的问题看似微不足道,但经过几个小时的搜索,我仍然没有解决。我已经阅读了数十篇 Boost Spirit 邮件列表帖子、SO 帖子、手册和 Spirit 本身的 headers,但仍然不太了解有关 Spirit Qi 解析的一些关键内容。
这是有问题的基本语法定义,它会进入 system_parser.hpp
:
#define BOOST_SPIRIT_USE_PHOENIX_V3 1
#include <boost/spirit/include/qi_core.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template<typename Iterator>
struct SystemParser : qi::grammar<Iterator, std::vector<std::string>(), boost::spirit::ascii::space_type>
{
SystemParser() : SystemParser::base_type(variable_group_)
{
namespace phx = boost::phoenix;
using qi::_1;
using qi::_val;
using qi::eps;
using qi::lit;
qi::symbols<char,int> encountered_variables;
qi::symbols<char,int> declarative_symbols;
declarative_symbols.add("variable_group",0);
// wraps the vector between its appropriate declaration and line termination.
BOOST_SPIRIT_DEBUG_NODE(variable_group_);
debug(variable_group_);
variable_group_.name("variable_group_");
variable_group_ %= lit("variable_group") >> genericvargp_ >> lit(';');
// creates a vector of strings
BOOST_SPIRIT_DEBUG_NODE(genericvargp_);
debug(genericvargp_);
genericvargp_.name("genericvargp_");
genericvargp_ %= new_variable_ % ',';
// will in the future make a shared pointer to an object using the string
BOOST_SPIRIT_DEBUG_NODE(new_variable_);
debug(new_variable_);
new_variable_.name("new_variable_");
new_variable_ %= unencountered_symbol_;
// this rule gets a string.
BOOST_SPIRIT_DEBUG_NODE(unencountered_symbol_);
debug(unencountered_symbol_);
unencountered_symbol_.name("unencountered_symbol");
unencountered_symbol_ %= valid_variable_name_ - ( encountered_variables | declarative_symbols);
// get a string which fits the naming rules.
BOOST_SPIRIT_DEBUG_NODE(valid_variable_name_);
valid_variable_name_.name("valid_variable_name_");
valid_variable_name_ %= +qi::alpha >> *(qi::alnum | qi::char_('_') | qi::char_('[') | qi::char_(']') );
}
// rule declarations. these are member variables for the parser.
qi::rule<Iterator, std::vector<std::string>(), ascii::space_type > variable_group_;
qi::rule<Iterator, std::vector<std::string>(), ascii::space_type > genericvargp_;
qi::rule<Iterator, std::string(), ascii::space_type> new_variable_;
qi::rule<Iterator, std::string(), ascii::space_type > unencountered_symbol_;// , ascii::space_type
// the rule which determines valid variable names
qi::rule<Iterator, std::string()> valid_variable_name_;
};
和一些使用它的代码:
#include "system_parsing.hpp"
int main(int argc, char** argv)
{
std::vector<std::string> V;
std::string str = "variable_group x, y, z;";
std::string::const_iterator iter = str.begin();
std::string::const_iterator end = str.end();
SystemParser<std::string::const_iterator> S;
bool s = phrase_parse(iter, end, S, boost::spirit::ascii::space, V);
std::cout << "the unparsed string:\n" << std::string(iter,end);
return 0;
}
它在 Clang 4 下编译。9.x 在 OSX 上很好。当我 运行 它时,我得到:
Assertion failed: (px != 0), function operator->, file /usr/local/include/boost/smart_ptr/shared_ptr.hpp, line 648.
或者,如果我在 variable_group_
规则的定义中使用期望运算符 >
而不是 >>
,我会得到我们亲爱的老朋友 Segmentation fault: 11
.
在我的学习过程中,我遇到了像 how to tell the type spirit is trying to generate, attribute propagation, how to interact with symbols, an example of infinite left recursion which lead to a segfault, information on parsing into classes, not structs which has a link to using Customization points (yet the links contain no examples), the Nabialek trick which couples keywords to actions, and perhaps most relevant for what I am trying to do dynamic difference parsing 这样的优秀帖子,随着符号集的增长,这当然是我需要的东西,我不允许以后将它们用作另一种类型,因为already-encountered 符号集开始为空,然后增长——也就是说,解析规则是动态的。
这就是我所在的位置。我当前的问题是这个特定示例生成的 assert/segfault 。但是,我在某些事情上不清楚,需要指导性建议,我只是没有从我咨询过的任何来源中收集这些建议,并且有希望使这个 SO 问题与之前提出的其他问题脱节的请求:
- 什么时候使用
lexeme
合适?我只是不知道 什么时候 使用 lexeme,而不是。
- 当使用
>
而不是>>
时,有哪些指导原则?
- 我见过许多 Fusion 改编示例,其中有一个要解析的结构和一组规则。我的输入文件可能会多次出现函数、变量等的声明,它们都需要放在同一个地方,所以我需要能够将终端的字段 class object 添加到我正在以任何顺序多次解析它。我想我想对 class object 使用 getter/setters,这样解析就不是 object 构造的唯一途径。这是个问题吗?
欢迎为初学者提供任何建议。
您引用了 symbols
个变量。但他们是本地人,所以一旦构造函数returns,他们就不存在了。此 调用 Undefined Behaviour。什么事都有可能发生。
使符号表成为 class 的成员。
也简化了周围的舞蹈
- 船长(参见 Boost spirit skipper issues)。 link 也回答了你的 _“什么时候使用
lexeme[]
是合适的。在你的示例中你缺少 encountered_variables|declarative_symbols
周围的 lexeme[]
,例如。
- 调试宏
operator%=
,以及一些通常不用的东西
- 猜测您不需要
symbols<>
的映射类型(因为 int
未被使用),简化了那里的初始化
演示
#define BOOST_SPIRIT_USE_PHOENIX_V3 1
#define BOOST_SPIRIT_DEBUG 1
#include <boost/spirit/include/qi_core.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template <typename Iterator, typename Skipper = ascii::space_type>
struct SystemParser : qi::grammar<Iterator, std::vector<std::string>(), Skipper> {
SystemParser() : SystemParser::base_type(variable_group_)
{
declarative_symbols += "variable_group";
variable_group_ = "variable_group" >> genericvargp_ >> ';';
genericvargp_ = new_variable_ % ',';
valid_variable_name_ = qi::alpha >> *(qi::alnum | qi::char_("_[]"));
unencountered_symbol_ = valid_variable_name_ - (encountered_variables|declarative_symbols);
new_variable_ = unencountered_symbol_;
BOOST_SPIRIT_DEBUG_NODES((variable_group_) (valid_variable_name_) (unencountered_symbol_) (new_variable_) (genericvargp_))
}
private:
qi::symbols<char, qi::unused_type> encountered_variables, declarative_symbols;
// rule declarations. these are member variables for the parser.
qi::rule<Iterator, std::vector<std::string>(), Skipper> variable_group_;
qi::rule<Iterator, std::vector<std::string>(), Skipper> genericvargp_;
qi::rule<Iterator, std::string()> new_variable_;
qi::rule<Iterator, std::string()> unencountered_symbol_; // , Skipper
// the rule which determines valid variable names
qi::rule<Iterator, std::string()> valid_variable_name_;
};
//#include "system_parsing.hpp"
int main() {
using It = std::string::const_iterator;
std::string const str = "variable_group x, y, z;";
SystemParser<It> S;
It iter = str.begin(), end = str.end();
std::vector<std::string> V;
bool s = phrase_parse(iter, end, S, boost::spirit::ascii::space, V);
if (s)
{
std::cout << "Parse succeeded: " << V.size() << "\n";
for (auto& s : V)
std::cout << " - '" << s << "'\n";
}
else
std::cout << "Parse failed\n";
if (iter!=end)
std::cout << "Remaining unparsed: '" << std::string(iter, end) << "'\n";
}
版画
Parse succeeded: 3
- 'x'
- 'y'
- 'z'
我正在使用灵气作为我的解析器,将数学表达式解析为表达式树。我跟踪诸如符号类型之类的事情,这些符号在我解析时遇到,并且必须在我正在解析的文本中声明。即,我正在解析 Bertini input files, a simple-ish example of which is here, a complicated example is here,为了完整起见,如下所示:
%input: our first input file
variable_group x,y;
function f,g;
f = x^2 - 1;
g = y^2 - 4;
END;
理想情况下,我一直在研究的语法
- 找到声明语句,然后解析如下comma-separated被声明类型的符号列表,并将得到的符号向量存入被解析的classobject ;例如
variable_group x, y;
- 找到一个先前声明的符号,后面跟一个等号,并且是该符号作为可评估数学的定义object;例如
f = x^2 - 1;
这部分我基本上都在掌控之中。 - 找到一个 not-previously 声明的符号后跟
=
,并将其作为子函数解析。我觉得我也能搞定。
我一直在努力解决的问题看似微不足道,但经过几个小时的搜索,我仍然没有解决。我已经阅读了数十篇 Boost Spirit 邮件列表帖子、SO 帖子、手册和 Spirit 本身的 headers,但仍然不太了解有关 Spirit Qi 解析的一些关键内容。
这是有问题的基本语法定义,它会进入 system_parser.hpp
:
#define BOOST_SPIRIT_USE_PHOENIX_V3 1
#include <boost/spirit/include/qi_core.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template<typename Iterator>
struct SystemParser : qi::grammar<Iterator, std::vector<std::string>(), boost::spirit::ascii::space_type>
{
SystemParser() : SystemParser::base_type(variable_group_)
{
namespace phx = boost::phoenix;
using qi::_1;
using qi::_val;
using qi::eps;
using qi::lit;
qi::symbols<char,int> encountered_variables;
qi::symbols<char,int> declarative_symbols;
declarative_symbols.add("variable_group",0);
// wraps the vector between its appropriate declaration and line termination.
BOOST_SPIRIT_DEBUG_NODE(variable_group_);
debug(variable_group_);
variable_group_.name("variable_group_");
variable_group_ %= lit("variable_group") >> genericvargp_ >> lit(';');
// creates a vector of strings
BOOST_SPIRIT_DEBUG_NODE(genericvargp_);
debug(genericvargp_);
genericvargp_.name("genericvargp_");
genericvargp_ %= new_variable_ % ',';
// will in the future make a shared pointer to an object using the string
BOOST_SPIRIT_DEBUG_NODE(new_variable_);
debug(new_variable_);
new_variable_.name("new_variable_");
new_variable_ %= unencountered_symbol_;
// this rule gets a string.
BOOST_SPIRIT_DEBUG_NODE(unencountered_symbol_);
debug(unencountered_symbol_);
unencountered_symbol_.name("unencountered_symbol");
unencountered_symbol_ %= valid_variable_name_ - ( encountered_variables | declarative_symbols);
// get a string which fits the naming rules.
BOOST_SPIRIT_DEBUG_NODE(valid_variable_name_);
valid_variable_name_.name("valid_variable_name_");
valid_variable_name_ %= +qi::alpha >> *(qi::alnum | qi::char_('_') | qi::char_('[') | qi::char_(']') );
}
// rule declarations. these are member variables for the parser.
qi::rule<Iterator, std::vector<std::string>(), ascii::space_type > variable_group_;
qi::rule<Iterator, std::vector<std::string>(), ascii::space_type > genericvargp_;
qi::rule<Iterator, std::string(), ascii::space_type> new_variable_;
qi::rule<Iterator, std::string(), ascii::space_type > unencountered_symbol_;// , ascii::space_type
// the rule which determines valid variable names
qi::rule<Iterator, std::string()> valid_variable_name_;
};
和一些使用它的代码:
#include "system_parsing.hpp"
int main(int argc, char** argv)
{
std::vector<std::string> V;
std::string str = "variable_group x, y, z;";
std::string::const_iterator iter = str.begin();
std::string::const_iterator end = str.end();
SystemParser<std::string::const_iterator> S;
bool s = phrase_parse(iter, end, S, boost::spirit::ascii::space, V);
std::cout << "the unparsed string:\n" << std::string(iter,end);
return 0;
}
它在 Clang 4 下编译。9.x 在 OSX 上很好。当我 运行 它时,我得到:
Assertion failed: (px != 0), function operator->, file /usr/local/include/boost/smart_ptr/shared_ptr.hpp, line 648.
或者,如果我在 variable_group_
规则的定义中使用期望运算符 >
而不是 >>
,我会得到我们亲爱的老朋友 Segmentation fault: 11
.
在我的学习过程中,我遇到了像 how to tell the type spirit is trying to generate, attribute propagation, how to interact with symbols, an example of infinite left recursion which lead to a segfault, information on parsing into classes, not structs which has a link to using Customization points (yet the links contain no examples), the Nabialek trick which couples keywords to actions, and perhaps most relevant for what I am trying to do dynamic difference parsing 这样的优秀帖子,随着符号集的增长,这当然是我需要的东西,我不允许以后将它们用作另一种类型,因为already-encountered 符号集开始为空,然后增长——也就是说,解析规则是动态的。
这就是我所在的位置。我当前的问题是这个特定示例生成的 assert/segfault 。但是,我在某些事情上不清楚,需要指导性建议,我只是没有从我咨询过的任何来源中收集这些建议,并且有希望使这个 SO 问题与之前提出的其他问题脱节的请求:
- 什么时候使用
lexeme
合适?我只是不知道 什么时候 使用 lexeme,而不是。 - 当使用
>
而不是>>
时,有哪些指导原则? - 我见过许多 Fusion 改编示例,其中有一个要解析的结构和一组规则。我的输入文件可能会多次出现函数、变量等的声明,它们都需要放在同一个地方,所以我需要能够将终端的字段 class object 添加到我正在以任何顺序多次解析它。我想我想对 class object 使用 getter/setters,这样解析就不是 object 构造的唯一途径。这是个问题吗?
欢迎为初学者提供任何建议。
您引用了 symbols
个变量。但他们是本地人,所以一旦构造函数returns,他们就不存在了。此 调用 Undefined Behaviour。什么事都有可能发生。
使符号表成为 class 的成员。
也简化了周围的舞蹈
- 船长(参见 Boost spirit skipper issues)。 link 也回答了你的 _“什么时候使用
lexeme[]
是合适的。在你的示例中你缺少encountered_variables|declarative_symbols
周围的lexeme[]
,例如。 - 调试宏
operator%=
,以及一些通常不用的东西- 猜测您不需要
symbols<>
的映射类型(因为int
未被使用),简化了那里的初始化
演示
#define BOOST_SPIRIT_USE_PHOENIX_V3 1
#define BOOST_SPIRIT_DEBUG 1
#include <boost/spirit/include/qi_core.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
template <typename Iterator, typename Skipper = ascii::space_type>
struct SystemParser : qi::grammar<Iterator, std::vector<std::string>(), Skipper> {
SystemParser() : SystemParser::base_type(variable_group_)
{
declarative_symbols += "variable_group";
variable_group_ = "variable_group" >> genericvargp_ >> ';';
genericvargp_ = new_variable_ % ',';
valid_variable_name_ = qi::alpha >> *(qi::alnum | qi::char_("_[]"));
unencountered_symbol_ = valid_variable_name_ - (encountered_variables|declarative_symbols);
new_variable_ = unencountered_symbol_;
BOOST_SPIRIT_DEBUG_NODES((variable_group_) (valid_variable_name_) (unencountered_symbol_) (new_variable_) (genericvargp_))
}
private:
qi::symbols<char, qi::unused_type> encountered_variables, declarative_symbols;
// rule declarations. these are member variables for the parser.
qi::rule<Iterator, std::vector<std::string>(), Skipper> variable_group_;
qi::rule<Iterator, std::vector<std::string>(), Skipper> genericvargp_;
qi::rule<Iterator, std::string()> new_variable_;
qi::rule<Iterator, std::string()> unencountered_symbol_; // , Skipper
// the rule which determines valid variable names
qi::rule<Iterator, std::string()> valid_variable_name_;
};
//#include "system_parsing.hpp"
int main() {
using It = std::string::const_iterator;
std::string const str = "variable_group x, y, z;";
SystemParser<It> S;
It iter = str.begin(), end = str.end();
std::vector<std::string> V;
bool s = phrase_parse(iter, end, S, boost::spirit::ascii::space, V);
if (s)
{
std::cout << "Parse succeeded: " << V.size() << "\n";
for (auto& s : V)
std::cout << " - '" << s << "'\n";
}
else
std::cout << "Parse failed\n";
if (iter!=end)
std::cout << "Remaining unparsed: '" << std::string(iter, end) << "'\n";
}
版画
Parse succeeded: 3
- 'x'
- 'y'
- 'z'