使用解析器组合器在 'seq(p, many(p))' 构造中抑制来自 'many' 的空结果

Suppress empty result from 'many' in 'seq(p, many(p))' construct with parser combinators

我正在尝试按照 Hutton 和 Meijer "Monadic Parser Combinators" 构建解析器组合器。我的实现是在 PostScript 中,但我认为我的问题是组合器解析器的一般问题,而不是我的具体实现。

作为一个小练习,我正在使用解析器来识别正则表达式。

(pc9.ps)run

/Dot         (.) char         def
/Meta        (*+?) anyof      def
/Character   (*+?.|()) noneof def

/Atom        //Dot
             //Character  plus  def
/Factor      //Atom  //Meta maybe  seq   def
/Term        //Factor  //Factor many  seq  def
/Expression  //Term  (|) char //Term xthen  many  seq  def

/regex { string-input //Expression exec ps } def

(abc|def|ghi) regex 

quit

它正在工作,但输出有很多 [] 空数组,当我尝试 bind 处理程序来处理这些值时,它们确实妨碍了我。

$ gsnd -q -dNOSAFER pc9re2.ps
stack:
[[[[[97 []] [[98 []] [[99 []] []]]] [[[100 []] [[101 []] [[102 []]
[]]]] [[[103 []] [[104 []] [[105 []] []]]] []]]] null]]

每当 seq 排序组合器接受来自 maybemany(使用 maybe)的零出现的结果时,就会发生这些情况。

使用 Parser Combinators 在输出中排除这种额外噪声的正常方法是什么?

github repo

唉。看来我可以围绕它实施。我在 seq 中添加了特殊代码来检测空的右侧并丢弃它。关于其他问题...

编辑: 我在版本 11(半)中再次遇到同样的问题。现在我有了更好的解决方案 IMO:

https://groups.google.com/g/comp.lang.functional/c/MbJxrJSk8Mw/m/MoT3Dr0IAwAJ

Ugh. I think it wasn't even an X/Y problem. It was a "doctor it hurts when I move my arm like this; ... so don't move your arm like that" problem.

I want the "result" part of the "reply" structure (using new terms following usage from the Parsec document) to be any of the /usual/ PostScript types: integer, real, string, boolean, array, dictionary.

But I also need some way to arbitrarily combine or concatenate two objects regardless of type. My then (aka seq) combinator needs to do this. So I made a hack-y function that does the combining. If it has two arrays, it composes the contents into a longer array. If it has one array and some other object it extends the array by one and stuffs the object in the front or back as appropriate. If it has two non-array objects it makes a new 2-element array to contain them.

So, instead of building xthen and thenx off of then and needing to cons, car, and cdr the stuff, I can write all 3 of these as a more general parameterized function.

sequence{ p q u }{
  { /p exec +is-ok {
      next x-xs force /q exec +is-ok {
        next x-xs 3 1 roll /u exec exch consok
      }{
        x-xs 3 2 roll ( after ) exch cons exch cons cons
      } ifelse
    } if } ll }  @func
then { {append} sequence }
xthen { {exch pop} sequence }
thenx { {pop} sequence }

append { 1 index zero eq { exch pop }{
                  dup zero eq { pop }{
         1 index type /arraytype eq {
             dup type /arraytype eq { compose }{ one compose } ifelse
         }{ dup type /arraytype eq { curry }{ cons } ifelse } ifelse } ifelse } ifelse }

(@func is my own non-standard extension to PostScript that takes a procedure body and list of parameters and wraps the procedure with code that defines the arguments in a local dictionary. ll is my hack-y PostScript way of making lambdas with hard-patched parameters, it's short for load all literals.)

该代码还将 可执行数组 (即 PostScript 过程)视为 用于组合结果序列的非数组。这允许 解析器用作生成程序的语法制导编译器 作为输出。