使用 megaparsec + parser-combinators 的排列解析过于宽松

Question

我正在尝试解析标志的排列。我想要的行为是“一个或多个标志以任何顺序，不重复”。我正在使用以下软件包：

百万秒差距
解析器组合器

我的代码输出了我想要的，但对输入太宽松了。我不明白为什么它接受多个相同的标志。我在这里做错了什么？

pFlags :: Parser [Flag]
pFlags = runPermutation $ f <$> 
    toPermutation (optional (GroupFlag <$ char '\'')) <*> 
    toPermutation (optional (LeftJustifyFlag <$ char '-'))
    where f a b = catMaybes [a, b]

示例：

"'-" = [GroupFlag, LeftJustifyFlag] -- CORRECT
"-'" = [LeftJustifyFlag, GroupFlag] -- CORRECT
"''''-" = [GroupFlag, LeftJustifyFlag] -- INCORRECT, should fail if there's more than one of the same flag.

Answer 1

而不是 toPermutation 和 optional，我相信你需要使用 toPermutationWithDefault，像这样（未经测试）：

toPermutationWithDefault Nothing (Just GroupFlag <$ char '\'')

推理在论文“Parsing Permutation Phrases”(PDF)的§4，“adding optional elements”(emph. added)中有描述：

Consider, for example […] all permutations of a, b and c. Suppose b can be empty and we want to recognise ac. This can be done in three different ways since the empty b can be recognised before a, after a or after c. Fortunately, it is irrelevant for the result of a parse where exactly the empty b is derived, since order is not important. This allows us to use a strategy similar to the one proposed by Cameron: parse nonempty constituents as they are seen and allow the parser to stop if all remaining elements are optional. When the parser stops the default values are returned for all optional elements that have not been recognised.

To implement this strategy we need to be able to determine whether a parser can derive the empty string and split it into its default value and its non-empty part, i.e. a parser that behaves the same except that it does not recognise the empty string.

也就是说，排列解析器需要知道哪些元素可以在不消耗输入的情况下成功，否则它会过于急于提交到一个分支。不过，我不知道为什么这会导致接受一个元素的倍数；也许您还缺少 eof?

使用 megaparsec + parser-combinators 的排列解析过于宽松

Permutation parsing with megaparsec + parser-combinators too lenient

haskell

parser-combinators

megaparsec