Boost spirit 替代运算符未填充所有属性值
Boost spirit alternative operator doesn't fill all attribute values
我使用boost spirit qi从文件中读取实数。我尝试实现条件解析器,其中输入取决于行中的第一个字符。
#include <iostream>
#include <boost/fusion/adapted/struct/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
using namespace std;
namespace qi = boost::spirit::qi;
struct MyStruct {
double r1, r2, r3, r4;
double r5, r6, r7, r8;
};
BOOST_FUSION_ADAPT_STRUCT(
MyStruct,
(double, r1), (double, r2), (double, r3), (double, r4),
(double, r5), (double, r6), (double, r7), (double, r8)
);
int main(int argc, wchar_t* argv[])
{
string test =
"A+1.000000000000e+00+2.000000000000e+00+3.000000000000e+00+4.000000000000e+00\r\n"
"B+5.000000000000e+00+6.000000000000e+00+7.000000000000e+00+8.000000000000e+00\r\n";
qi::rule<string::const_iterator> CRLF = qi::copy(qi::lit("\r\n"));
qi::real_parser d19_12;
MyStruct ms;
qi::rule<string::const_iterator, MyStruct()> gr =
qi::lit("A") >> d19_12 >> d19_12 >> d19_12 >> d19_12 >> CRLF
>> (
(qi::lit('B') >> d19_12 >> d19_12 >> d19_12 >> d19_12 >> CRLF)
|
(qi::lit('C') >> d19_12 >> d19_12 >> d19_12 >> +qi::lit('_') >> qi::attr(0.0) >> CRLF)
)
;
string::const_iterator f = test.cbegin();
string::const_iterator e = test.cend();
bool ret = qi::parse(f, e, gr, ms);
return ret;
}
在没有 'C' 选项的情况下,一切都按预期工作,但添加此选项会使解析器跳过这些值,结果是
-
ms MyStruct
r1 1.0000000000000000 double
r2 2.0000000000000000 double
r3 3.0000000000000000 double
r4 4.0000000000000000 double
r5 5.0000000000000000 double
r6 -9.2559631349317831e+61 double
r7 -9.2559631349317831e+61 double
r8 -9.2559631349317831e+61 double
预期结果是:
-
ms MyStruct
r1 1.0000000000000000 double
r2 2.0000000000000000 double
r3 3.0000000000000000 double
r4 4.0000000000000000 double
r5 5.0000000000000000 double
r6 6.0000000000000000 double
r7 7.0000000000000000 double
r8 8.0000000000000000 double
谢谢
您可以调试规则。因此,将输入简化为 "A+1+2+3+4\r\nB+5+6+7+8\r\n"
并将真正的解析器包装到规则中,这是调试输出:
<gr>
<try>A+1+2+3+4\r\nB+5+6+7+8</try>
<d19_12>
<try>+1+2+3+4\r\nB+5+6+7+8\r</try>
<success>+2+3+4\r\nB+5+6+7+8\r\n</success>
<attributes>[1]</attributes>
</d19_12>
<d19_12>
<try>+2+3+4\r\nB+5+6+7+8\r\n</try>
<success>+3+4\r\nB+5+6+7+8\r\n</success>
<attributes>[2]</attributes>
</d19_12>
<d19_12>
<try>+3+4\r\nB+5+6+7+8\r\n</try>
<success>+4\r\nB+5+6+7+8\r\n</success>
<attributes>[3]</attributes>
</d19_12>
<d19_12>
<try>+4\r\nB+5+6+7+8\r\n</try>
<success>\r\nB+5+6+7+8\r\n</success>
<attributes>[4]</attributes>
</d19_12>
<CRLF>
<try>\r\nB+5+6+7+8\r\n</try>
<success>B+5+6+7+8\r\n</success>
<attributes>[]</attributes>
</CRLF>
<d19_12>
<try>+5+6+7+8\r\n</try>
<success>+6+7+8\r\n</success>
<attributes>[5]</attributes>
</d19_12>
<d19_12>
<try>+6+7+8\r\n</try>
<success>+7+8\r\n</success>
<attributes>[6]</attributes>
</d19_12>
<d19_12>
<try>+7+8\r\n</try>
<success>+8\r\n</success>
<attributes>[7]</attributes>
</d19_12>
<d19_12>
<try>+8\r\n</try>
<success>\r\n</success>
<attributes>[8]</attributes>
</d19_12>
<CRLF>
<try>\r\n</try>
<success></success>
<attributes>[]</attributes>
</CRLF>
<success></success>
<attributes>[[1, 2, 3, 4, 5, 4.27256e+180, 0, 0]]</attributes>
</gr>
Parsed: (1 2 3 4 5 4.27256e+180 0 0)
确实,它确认所有数字都已解析。为什么属性传播没有按照您的预期进行?
我的猜测是它试图接受比您预期的多一点的属性传播。问题是您的 AST 不直接匹配规则:规则综合
tup4 := tuple<double, double, double, double>
attribute := tuple<tup4, variant<tup4, tup4> >
在 Qi 版本中这确实被简化为 tuple<tup4, tup4>
但你的 AST 实际上就像一个 tup8
,这是不一样的。所以在传播时,规则只会做它认为最好的选择,即分配第一个 tup4
。然后:耸耸肩:
修复
最简单的解决方法是使您的 AST 符合规则。这实际上可能最有意义,因为 "A"
、"B"
、“C
”更有可能具有语义含义。
namespace Ast {
struct A {
double r1, r2, r3, r4;
};
struct BC {
double r5, r6, r7, r8;
};
struct MyStruct {
A a;
BC bc;
};
using boost::fusion::operator<<;
} // namespace Ast
正在调整它们:
BOOST_FUSION_ADAPT_STRUCT(Ast::A, r1, r2, r3, r4)
BOOST_FUSION_ADAPT_STRUCT(Ast::BC, r5, r6, r7, r8)
BOOST_FUSION_ADAPT_STRUCT(Ast::MyStruct, a, bc)
Note that, without further changes, this just confirms that automatic attribute propagation is a heuristics--based: Coliru: Parsed: ((1 0 0 0) (2 0 0 0))
(oops)
使规则匹配该结构:
qi::rule<It> CRLF = "\r\n";
qi::rule<It, double> d19_12 = qi::double_;
qi::rule<It, Ast::A()> A = "A" >> d19_12 >> d19_12 >> d19_12 >> d19_12; //
qi::rule<It, Ast::BC()> BC = //
'B' >> d19_12 >> d19_12 >> d19_12 >> d19_12 | //
'C' >> d19_12 >> d19_12 >> d19_12 >> +qi::lit('_') >> qi::attr(0.0);
qi::rule<It, Ast::MyStruct()> gr = A >> CRLF >> BC >> CRLF;
现在一切正常:Coliru
版画
Parsed: ((1 2 3 4) (5 6 7 8))
开箱即用
很多这对我来说似乎是 XY 问题。一个包含 8 个可以具有不同含义的非描述性数字的结构似乎...不是您实际需要的。
此外,B/C 区别似乎表明您真的想要一个“可选号码”规则:
rule<It> CRLF = "\r\n";
rule<It, double()> d19_12 = raw[ //
double_[_val = _1] | //
omit[+char_("_")] //
][_pass = px::size(_1) == 19];
rule<It, Ast::Tup4()> Tup4 =
omit[char_("ABC")] >> d19_12 >> d19_12 >> d19_12 >> d19_12;
注意 omit[char_("ABC")]
如何直接反映我的直觉,即您正在丢弃模型中的语义信息。
现在语法变成了
rule<It, Ast::MyStruct()> gr = Tup4 >> CRLF >> Tup4 >> CRLF;
事实上,它解析了完整的输入:Coliru
Parsed: ((1.0001 2.0002 3.0003 4.0004) (5.0005 6.0006 7.0007 8.0008))
简化!集装箱
事实上,我怀疑像这样的东西可能会更好地为您服务:
namespace Ast {
using Reals = boost::container::static_vector<double, 8>;
} // namespace Ast
有趣的是,容器 do 享有更灵活的属性传播(使用新的 caveat)。你可以有一些直截了当的东西:
qi::rule<It, Ast::Reals(char const*)> Line =
qi::omit[qi::char_(_r1)] >> d19_12 >> d19_12 >> d19_12 >> d19_12;
qi::rule<It, Ast::Reals()> gr = //
Line(+"A") >> CRLF >> Line(+"BC") >> CRLF;
让我用一个这样的活生生的例子来结束:Live On Compiler Explorer¹
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/container/static_vector.hpp>
#include <fmt/ranges.h>
#include <iomanip>
#include <iostream>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
namespace Ast {
using Reals = boost::container::static_vector<double, 8>;
} // namespace Ast
int main()
{
using It = std::string::const_iterator;
using namespace qi::labels;
qi::rule<It> CRLF = "\r\n";
qi::rule<It, double()> d19_12 = qi::raw[ //
qi::double_[_val = _1] | //
qi::omit[+qi::char_("_")] //
][_pass = px::size(_1) == 19];
qi::rule<It, Ast::Reals(char const*)> Line =
qi::omit[qi::char_(_r1)] >> d19_12 >> d19_12 >> d19_12 >> d19_12;
qi::rule<It, Ast::Reals()> gr = //
Line(+"A") >> CRLF >> Line(+"BC") >> CRLF;
BOOST_SPIRIT_DEBUG_NODES((gr)(Line)(d19_12)(CRLF))
for (std::string const test : {
"A+1.000100000000e+00+2.000200000000e+00+3.000300000000e+00+4.000400000000e+00\r\n"
"B+5.000500000000e+00+6.000600000000e+00+7.000700000000e+00+8.000800000000e+00\r\n",
"A+1.000100000000e+00+2.000200000000e+00+3.000300000000e+00+4.000400000000e+00\r\n"
"C+5.000500000000e+00+6.000600000000e+00+7.000700000000e+00___________________\r\n",
}) {
It f = test.cbegin(), e = test.cend();
Ast::Reals data;
if (parse(f, e, gr, data)) {
fmt::print("Parsed: {}\n", data);
} else {
fmt::print("Failed\n");
}
if (f != e) {
std::cout << "Remaining: " << std::quoted(std::string(f, e))
<< "\n";
}
}
}
版画
Parsed: {1.0001, 2.0002, 3.0003, 4.0004, 5.0005, 6.0006, 7.0007, 8.0008}
Parsed: {1.0001, 2.0002, 3.0003, 4.0004, 5.0005, 6.0006, 7.0007, 0}
¹ 我懒于输出格式,使用 libfmt 而不是再次编写我的矢量打印 cruft; Coliru 还没有 libfmt(或 c++23)
我使用boost spirit qi从文件中读取实数。我尝试实现条件解析器,其中输入取决于行中的第一个字符。
#include <iostream>
#include <boost/fusion/adapted/struct/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
using namespace std;
namespace qi = boost::spirit::qi;
struct MyStruct {
double r1, r2, r3, r4;
double r5, r6, r7, r8;
};
BOOST_FUSION_ADAPT_STRUCT(
MyStruct,
(double, r1), (double, r2), (double, r3), (double, r4),
(double, r5), (double, r6), (double, r7), (double, r8)
);
int main(int argc, wchar_t* argv[])
{
string test =
"A+1.000000000000e+00+2.000000000000e+00+3.000000000000e+00+4.000000000000e+00\r\n"
"B+5.000000000000e+00+6.000000000000e+00+7.000000000000e+00+8.000000000000e+00\r\n";
qi::rule<string::const_iterator> CRLF = qi::copy(qi::lit("\r\n"));
qi::real_parser d19_12;
MyStruct ms;
qi::rule<string::const_iterator, MyStruct()> gr =
qi::lit("A") >> d19_12 >> d19_12 >> d19_12 >> d19_12 >> CRLF
>> (
(qi::lit('B') >> d19_12 >> d19_12 >> d19_12 >> d19_12 >> CRLF)
|
(qi::lit('C') >> d19_12 >> d19_12 >> d19_12 >> +qi::lit('_') >> qi::attr(0.0) >> CRLF)
)
;
string::const_iterator f = test.cbegin();
string::const_iterator e = test.cend();
bool ret = qi::parse(f, e, gr, ms);
return ret;
}
在没有 'C' 选项的情况下,一切都按预期工作,但添加此选项会使解析器跳过这些值,结果是
-
ms MyStruct r1 1.0000000000000000 double r2 2.0000000000000000 double r3 3.0000000000000000 double r4 4.0000000000000000 double r5 5.0000000000000000 double r6 -9.2559631349317831e+61 double r7 -9.2559631349317831e+61 double r8 -9.2559631349317831e+61 double
预期结果是:
-
ms MyStruct r1 1.0000000000000000 double r2 2.0000000000000000 double r3 3.0000000000000000 double r4 4.0000000000000000 double r5 5.0000000000000000 double r6 6.0000000000000000 double r7 7.0000000000000000 double r8 8.0000000000000000 double
谢谢
您可以调试规则。因此,将输入简化为 "A+1+2+3+4\r\nB+5+6+7+8\r\n"
并将真正的解析器包装到规则中,这是调试输出:
<gr>
<try>A+1+2+3+4\r\nB+5+6+7+8</try>
<d19_12>
<try>+1+2+3+4\r\nB+5+6+7+8\r</try>
<success>+2+3+4\r\nB+5+6+7+8\r\n</success>
<attributes>[1]</attributes>
</d19_12>
<d19_12>
<try>+2+3+4\r\nB+5+6+7+8\r\n</try>
<success>+3+4\r\nB+5+6+7+8\r\n</success>
<attributes>[2]</attributes>
</d19_12>
<d19_12>
<try>+3+4\r\nB+5+6+7+8\r\n</try>
<success>+4\r\nB+5+6+7+8\r\n</success>
<attributes>[3]</attributes>
</d19_12>
<d19_12>
<try>+4\r\nB+5+6+7+8\r\n</try>
<success>\r\nB+5+6+7+8\r\n</success>
<attributes>[4]</attributes>
</d19_12>
<CRLF>
<try>\r\nB+5+6+7+8\r\n</try>
<success>B+5+6+7+8\r\n</success>
<attributes>[]</attributes>
</CRLF>
<d19_12>
<try>+5+6+7+8\r\n</try>
<success>+6+7+8\r\n</success>
<attributes>[5]</attributes>
</d19_12>
<d19_12>
<try>+6+7+8\r\n</try>
<success>+7+8\r\n</success>
<attributes>[6]</attributes>
</d19_12>
<d19_12>
<try>+7+8\r\n</try>
<success>+8\r\n</success>
<attributes>[7]</attributes>
</d19_12>
<d19_12>
<try>+8\r\n</try>
<success>\r\n</success>
<attributes>[8]</attributes>
</d19_12>
<CRLF>
<try>\r\n</try>
<success></success>
<attributes>[]</attributes>
</CRLF>
<success></success>
<attributes>[[1, 2, 3, 4, 5, 4.27256e+180, 0, 0]]</attributes>
</gr>
Parsed: (1 2 3 4 5 4.27256e+180 0 0)
确实,它确认所有数字都已解析。为什么属性传播没有按照您的预期进行?
我的猜测是它试图接受比您预期的多一点的属性传播。问题是您的 AST 不直接匹配规则:规则综合
tup4 := tuple<double, double, double, double>
attribute := tuple<tup4, variant<tup4, tup4> >
在 Qi 版本中这确实被简化为 tuple<tup4, tup4>
但你的 AST 实际上就像一个 tup8
,这是不一样的。所以在传播时,规则只会做它认为最好的选择,即分配第一个 tup4
。然后:耸耸肩:
修复
最简单的解决方法是使您的 AST 符合规则。这实际上可能最有意义,因为 "A"
、"B"
、“C
”更有可能具有语义含义。
namespace Ast {
struct A {
double r1, r2, r3, r4;
};
struct BC {
double r5, r6, r7, r8;
};
struct MyStruct {
A a;
BC bc;
};
using boost::fusion::operator<<;
} // namespace Ast
正在调整它们:
BOOST_FUSION_ADAPT_STRUCT(Ast::A, r1, r2, r3, r4)
BOOST_FUSION_ADAPT_STRUCT(Ast::BC, r5, r6, r7, r8)
BOOST_FUSION_ADAPT_STRUCT(Ast::MyStruct, a, bc)
Note that, without further changes, this just confirms that automatic attribute propagation is a heuristics--based: Coliru:
Parsed: ((1 0 0 0) (2 0 0 0))
(oops)
使规则匹配该结构:
qi::rule<It> CRLF = "\r\n";
qi::rule<It, double> d19_12 = qi::double_;
qi::rule<It, Ast::A()> A = "A" >> d19_12 >> d19_12 >> d19_12 >> d19_12; //
qi::rule<It, Ast::BC()> BC = //
'B' >> d19_12 >> d19_12 >> d19_12 >> d19_12 | //
'C' >> d19_12 >> d19_12 >> d19_12 >> +qi::lit('_') >> qi::attr(0.0);
qi::rule<It, Ast::MyStruct()> gr = A >> CRLF >> BC >> CRLF;
现在一切正常:Coliru
版画
Parsed: ((1 2 3 4) (5 6 7 8))
开箱即用
很多这对我来说似乎是 XY 问题。一个包含 8 个可以具有不同含义的非描述性数字的结构似乎...不是您实际需要的。
此外,B/C 区别似乎表明您真的想要一个“可选号码”规则:
rule<It> CRLF = "\r\n";
rule<It, double()> d19_12 = raw[ //
double_[_val = _1] | //
omit[+char_("_")] //
][_pass = px::size(_1) == 19];
rule<It, Ast::Tup4()> Tup4 =
omit[char_("ABC")] >> d19_12 >> d19_12 >> d19_12 >> d19_12;
注意 omit[char_("ABC")]
如何直接反映我的直觉,即您正在丢弃模型中的语义信息。
现在语法变成了
rule<It, Ast::MyStruct()> gr = Tup4 >> CRLF >> Tup4 >> CRLF;
事实上,它解析了完整的输入:Coliru
Parsed: ((1.0001 2.0002 3.0003 4.0004) (5.0005 6.0006 7.0007 8.0008))
简化!集装箱
事实上,我怀疑像这样的东西可能会更好地为您服务:
namespace Ast {
using Reals = boost::container::static_vector<double, 8>;
} // namespace Ast
有趣的是,容器 do 享有更灵活的属性传播(使用新的 caveat)。你可以有一些直截了当的东西:
qi::rule<It, Ast::Reals(char const*)> Line =
qi::omit[qi::char_(_r1)] >> d19_12 >> d19_12 >> d19_12 >> d19_12;
qi::rule<It, Ast::Reals()> gr = //
Line(+"A") >> CRLF >> Line(+"BC") >> CRLF;
让我用一个这样的活生生的例子来结束:Live On Compiler Explorer¹
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/container/static_vector.hpp>
#include <fmt/ranges.h>
#include <iomanip>
#include <iostream>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
namespace Ast {
using Reals = boost::container::static_vector<double, 8>;
} // namespace Ast
int main()
{
using It = std::string::const_iterator;
using namespace qi::labels;
qi::rule<It> CRLF = "\r\n";
qi::rule<It, double()> d19_12 = qi::raw[ //
qi::double_[_val = _1] | //
qi::omit[+qi::char_("_")] //
][_pass = px::size(_1) == 19];
qi::rule<It, Ast::Reals(char const*)> Line =
qi::omit[qi::char_(_r1)] >> d19_12 >> d19_12 >> d19_12 >> d19_12;
qi::rule<It, Ast::Reals()> gr = //
Line(+"A") >> CRLF >> Line(+"BC") >> CRLF;
BOOST_SPIRIT_DEBUG_NODES((gr)(Line)(d19_12)(CRLF))
for (std::string const test : {
"A+1.000100000000e+00+2.000200000000e+00+3.000300000000e+00+4.000400000000e+00\r\n"
"B+5.000500000000e+00+6.000600000000e+00+7.000700000000e+00+8.000800000000e+00\r\n",
"A+1.000100000000e+00+2.000200000000e+00+3.000300000000e+00+4.000400000000e+00\r\n"
"C+5.000500000000e+00+6.000600000000e+00+7.000700000000e+00___________________\r\n",
}) {
It f = test.cbegin(), e = test.cend();
Ast::Reals data;
if (parse(f, e, gr, data)) {
fmt::print("Parsed: {}\n", data);
} else {
fmt::print("Failed\n");
}
if (f != e) {
std::cout << "Remaining: " << std::quoted(std::string(f, e))
<< "\n";
}
}
}
版画
Parsed: {1.0001, 2.0002, 3.0003, 4.0004, 5.0005, 6.0006, 7.0007, 8.0008}
Parsed: {1.0001, 2.0002, 3.0003, 4.0004, 5.0005, 6.0006, 7.0007, 0}
¹ 我懒于输出格式,使用 libfmt 而不是再次编写我的矢量打印 cruft; Coliru 还没有 libfmt(或 c++23)