Boost Spirit - 修剪最后一个字符和分隔符之间的空格

Question

振奋精神新人在这里

我有一个 "Key:Value\r\nKey2:Value2\r\n" 形式的字符串，我正在尝试解析它。在这种特定形式下，使用 Boost Spirit 进行解析是微不足道的。但是，为了更健壮，我还需要处理这样的情况：

" 我的密钥：值 \r\n My2ndKey：长 <4 spaces> 值 \r\n"

在这种情况下，我需要在 key/value 分隔符前后 trim 前导和尾随 spaces，以便获得以下映射：

"My Key", "Value"

"My2ndKey", "Long<4 spaces>Value"

我使用 qi::hold 来实现这一点，但由于我尝试使用的嵌入式解析器不支持 boost::multi_pass 迭代器，我得到了编译错误。必须有一种简单的方法来实现这一点。

我阅读了以下文章（以及许多其他关于该主题的文章）：

http://boost-spirit.com/home/articles/qi-example/parsing-a-list-of-key-value-pairs-using-spirit-qi/ http://boost-spirit.com/home/2010/02/24/parsing-skippers-and-skipping-parsers/

Boost spirit parsing string with leading and trailing whitespace

我正在寻找我的问题的解决方案，这些文章似乎没有完全涵盖这些问题。我还想更好地了解这是如何实现的。作为一个小的奖励问题，我一直看到“%=”运算符，这对我来说有用吗？ MyRule %= MyRule ... 用于递归解析？

下面的代码正确地解析了我的字符串，只是它没有删除最后一个非 space 字符和分隔符之间的 space。 :( 使用的船长是 qi::blank_type（space 没有 EOL）。

谢谢！

template <typename Iterator, typename Skipper>
struct KeyValueParser : qi::grammar<Iterator, std::map<std::string, std::string>(), Skipper> {
  KeyValueParser() : KeyValueParser::base_type(ItemRule) {
    ItemRule = PairRule >> *(qi::lit(END_OF_CMD) >> PairRule);
    PairRule = KeyRule >> PAIR_SEP >> ValueRule;
    KeyRule = +(qi::char_ - qi::lit(PAIR_SEP));
    ValueRule = +(qi::char_ - qi::lit(END_OF_CMD));
  }
  qi::rule<Iterator, std::map<std::string, std::string>(), Skipper> ItemRule;
  qi::rule<Iterator, std::pair<std::string, std::string>(), Skipper> PairRule;
  qi::rule<Iterator, std::string()> KeyRule;
  qi::rule<Iterator, std::string()> ValueRule;
};

Answer 1

您需要使用KeyRule = qi::raw[ +(qi::char_ - qi::lit(PAIR_SEP)) ];

为了了解原因，让我们尝试研究几种解析字符串 "a b :" 的方法。

首先让我们记住以下 parsers/directives 是如何工作的：

lexeme[subject]：此指令匹配 subject，同时禁用船长。
raw[subject]：丢弃 subject 的属性和 returns 指向输入流中匹配字符的迭代器对。
+subject：plus 解析器尝试匹配其 subject.
a-b：差异解析器首先尝试解析b，如果b成功，a-b失败。当b失败时，匹配a.
char_：匹配任意字符。这是一个 PrimitiveParser.
lit(':')：匹配 ':' 但忽略其属性。这是一个 PrimitiveParser.

lexeme[ +(char_ - lit(':')) ]：通过从你的规则中删除船长，你有一个隐含的词素。由于没有船长，所以它是这样的：

'a' -> ':' fails, char_ matches 'a', the current synthesized attribute is "a"
' ' -> ':' fails, char_ matches ' ', the current synthesized attribute is "a "
'b' -> ':' fails, char_ matches 'b', the current synthesized attribute is "a b"
' ' -> ':' fails, char_ matches ' ', the current synthesized attribute is "a b "
':' -> ':' succeeds, the final synthesized attribute is "a b "

+(char_ - lit(':'))：因为它有一个 skipper，每个 PrimitiveParser 在被尝试之前都会预跳过：

'a' -> ':' fails, char_ matches 'a', the current synthesized attribute is "a"
' ' -> this is skipped before ':' is tried
'b' -> ':' fails, char_ matches 'b', the current synthesized attribute is "ab"
' ' -> this is skipped before ':' is tried
':' -> ':' succeeds, the final synthesized attribute is "ab"

raw[ +(char_ - lit(':') ) ]：题目和2.一模一样。 raw 指令忽略 "ab" 和 returns 从 'a' 到 'b' 的迭代器对。由于您的规则的属性是 std::string，因此从该迭代器对构造一个字符串，从而得到您想要的 "a b"。

Boost Spirit - 修剪最后一个字符和分隔符之间的空格

Boost Spirit - Trimming spaces between last character and separator

c++

parsing

boost

boost-spirit

boost-spirit-qi