我可以从我的船长解析器收集属性吗？

Question

我有一个数据文件格式，其中包括

/*评论*/
/* 嵌套的 /* 注释 */ 也是 */ and
// c++ 风格的单行注释..

像往常一样，这些注释可以出现在允许正常白色 space 的输入文件中的任何地方。

因此，我制作了一个可以处理白色 space 和各种评论的 skipper 解析器，而不是通过普遍的评论处理来污染语法本身。

到目前为止一切顺利，我能够解析我所有的测试用例。

但是，在我的用例中，如果存在一个或多个注释，则任何已解析的值（双精度、字符串、变量、列表...）都必须将其前面的注释作为属性携带。也就是说，我的 double AST 节点应该是

struct Double {
   double value;
   std::string comment;
};

等等我在语法中的所有值。

因此我想知道是否有可能以某种方式将收集到的评论“存储”在 skipper 解析器中，然后让它们可用于在普通语法中构建 AST 节点？

处理评论的船长：

template<typename Iterator>
struct SkipperRules : qi::grammar<Iterator> {
    SkipperRules() : SkipperRules::base_type(skipper) {
        single_line_comment = lit("//") >> *(char_ - eol) >> (eol | eoi);
        block_comment = ((string("/*") >> *(block_comment | char_ - "*/")) >> string("*/"));
        skipper = space | single_line_comment | block_comment;
    }
    qi::rule<Iterator> skipper;
    qi::rule<Iterator, std::string()> block_comment;
    qi::rule<Iterator, std::string()> single_line_comment;
};

我可以在 skipper 规则中使用全局变量和语义操作来存储注释，但这似乎是错误的，并且通常在解析器回溯中可能不会很好地发挥作用。什么是存储注释以便以后在主语法中检索它们的好方法？

Answer 1

I can store the commments using a global variable and semantic actions in the skipper rule, but that seems wrong and probably won't play well in general with parser backtracking.

好思路。参见 Boost Spirit: "Semantic actions are evil"?。此外，在您的情况下，它会使源位置与评论的相关性不必要地复杂化。

can I collect attributes from my skipper parser?

你不能。船长是隐含的 qi::omit[]（顺便说一下，就像 Kleene-% 列表中的分隔符）。

In my use case, however, any of the parsed values (double, string, variable, list, ...) must carry the comments preceding it as an attribute, if one or more comments are present. That is, my AST node for double should be
struct Double {
   double value;
   std::string comment;
};

你知道了：你的评论不是评论。您在 AST 中需要它们，因此在语法中也需要它们。

想法

我有几个想法。

你可以简单地不使用 skipper 来丰富评论，就像你提到的那样，在语法中将是 cumbersome/noisy。
您可以暂时覆盖船长，使其仅在qi::space需要评论的地方。像
```
value_ = qi::skip(qi::space) [ comment_ >> (string_|qi::double_|qi::int_)  ];
```
或者考虑到你的 AST，可能更冗长一些
```
value_ = qi::skip(qi::space) [ comment_ >> (string_|double_|int_) ];
string_ = comment_ >> lexeme['"' >> *('\' >> qi::char_ | ~qi::char_('"')) >> '"'];
double_ = comment_ >> qi::real_parser<double, qi::strict_real_policies<double> >{};
int_    = comment_ >> qi::int_;
```
备注：
- 在这种情况下，确保 double_、string_ 和 int_ 以 qi::space_type 声明为船长（参见 Boost spirit skipper issues）
- 假定 comment_ 规则公开 std::string() 属性。如果在 skipper 上下文中使用也很好，因为实际属性将绑定到 qi::unused_type，它编译为无操作以进行属性传播。
- 作为一个更微妙的旁注，我确保在第二个片段中使用严格的实数策略，这样双分支也不会吃整数。
一个奇特的解决方案可能是将增强的评论存储到“解析器状态”（例如成员变量）中，然后使用on_success 处理程序根据需要将该值传输到规则属性中（并可选择刷新某些规则完成时的注释）。

I have some examples of what can be achieved using on_success for inspiration: https://whosebug.com/search?q=user%3A85371+on_success+qi. (Specifically look at the way position information is being added to AST nodes. There's a subtle play with fusion-adapted struct vs. members that are being set outside the control of autmatic attribute propagation. A particularly nice method is to use a base-class that can be generically "detected" so AST nodes deriving from that base magically get the contextual comments added without code duplication)

实际上这是一个混合体：是的，您使用语义操作来“旁路”评论值。但是，它不那么笨重，因为现在您可以在成功处理程序中确定性地“收获”这些值。如果您不过早地重置评论，它甚至应该在回溯下一般都能正常工作。

对此的一个抱怨是，对“魔术评论”的机制进行推理会稍微不那么透明。然而，它确实很好，原因有二：
```
- "magic comments" are a semantic hack whichever way you look at it, so it matches the grammar semantics in the code
- it does succeed at removing comment noise from productions, which is effectively what the comments were from in the first place: they were embellishing the semantics without complicating the language grammar.
```

我认为选项 2. 是您可能没有意识到的“直接”方法。选项 3. 是一种奇特的方法，如果您想享受更大的 genericity/flexibility。例如。你会用

做什么

  /*obsolete*/ /*deprecated*/ 5.12e7

或者，

  bla = /*this is*/ 42 /*also relevant*/;

这些在 'fancy' 情况下更容易正确处理。

因此，如果您想避免复杂性，我建议选择选项 2。如果您需要灵活性，我建议选项 3。

我可以从我的船长解析器收集属性吗？

Can I collect attributes from my skipper parser?

parsing

attributes

boost-spirit

skipper

想法