dypgen 中的模式匹配

Question

我想处理 dypgen 中的一些歧义。我在手册中找到了一些东西，我想知道如何使用它。在手册第5.2点"Pattern matching on Symbols"有一个例子：

expr:
| expr OP<"+"> expr {  +  }
| expr OP<"*"> expr {  *  }

OP匹配“+”或“*”，据我了解。我也在那里找到：

The patterns can be any Caml patterns (but without the keyword when). For instance this is possible:
expr: expr<(Function([arg1;arg2],f_body)) as f> expr
{ some action }

所以我试着放一些其他的表达方式，但我不明白，会发生什么。如果我把 printf 放在那里，它会输出匹配字符串的值。但是如果我在那里输入 (fun x -> printf x)，在我看来这与 printf 相同，dypgen 会抱怨语法错误并指向表达式的末尾。如果我把 Printf.printf 放在那里，它会抱怨 Syntax error: operator expected。如果我把 (fun x -> Printf.printf x) 放在那里，它会说：Lexing failed with message: lexing: empty token 这些不同的错误消息是什么意思？

最后我想在哈希表中查找一些东西，如果值在那里，但我不知道，如果这样可能的话。可以还是不可以？

编辑：从 dypgen-demos 的 forest-example 派生的最小示例。

语法文件 forest_parser.dyp 包含：

{
open Parse_tree
let dyp_merge = Dyp.keep_all
}

%start main
%layout [' ' '\t']

%%

main : np "." "\n" {  }

np:
  |    sg                   {Noun()}
  |    pl                   {Noun()}

sg: word    <Word("sheep"|"fish")>  {Sg()}
sg: word    <Word("cat"|"dog")>  {Sg()}
pl: word    <Word("sheep"|"fish")>  {Pl()}
pl: word    <Word("cats"|"dogs")>  {Pl()}

/* OR try:
    sg: word    <printf>  {Sg()}
    pl: word    <printf>  {Pl()}
*/

word: 
  | (['A'-'Z' 'a'-'z']+)    {Word()}

forest.ml现在有以下print_forest-function：

let print_forest forest =
  let rec aux1 t = match t with
    | Word x
    -> print_string x
    | Noun (x) -> (
        print_string "N [";
        aux1 x;
        print_string " ]")
    | Sg (x) -> (
        print_string "Sg [";
        aux1 x;
        print_string " ]")
    | Pl (x) -> (
        print_string "Pl [";
        aux1 x;
        print_string " ]")
  in
  let aux2 t = aux1 t; print_newline () in
  List.iter aux2 forest;
  print_newline ()

并且 parser_tree.mli 包含：

type tree = 
  | Word        of string
  | Noun        of tree
  | Sg          of tree
  | Pl          of tree

然后你可以确定鱼、羊、猫等的数量。

sheep or fish can be singular and plural. cats and dogs cannot.

fish.
N [Sg [fish ] ]
N [Pl [fish ] ]

Answer 1

我对 Dypgen 一无所知，所以我想弄明白。

让我们看看我发现了什么。

在 parser.dyp 文件中，您可以定义词法分析器和解析器，也可以使用外部词法分析器。这是我所做的：

我的 ast 看起来像这样：

parse_prog.mli

type f = 
  | Print of string
  | Function of string list * string * string

type program = f list

prog_parser.dyp

{
  open Parse_prog

  (* let dyp_merge = Dyp.keep_all *)    

  let string_buf = Buffer.create 10
}

%start main

%relation pf<pr

%lexer

let newline = '\n'
let space = [' ' '\t' '\r']
let uident = ['A'-'Z']['a'-'z' 'A'-'Z' '0'-'9' '_']*
let lident = ['a'-'z']['a'-'z' 'A'-'Z' '0'-'9' '_']*

rule string = parse
  | '"' { () }
  | _ { Buffer.add_string string_buf (Dyp.lexeme lexbuf);
      string lexbuf }

main lexer =
  newline | space + -> { () }
  "fun"  -> ANONYMFUNCTION { () }
  lident -> FUNCTION { Dyp.lexeme lexbuf }
  uident -> MODULE { Dyp.lexeme lexbuf }
  '"' -> STRING { Buffer.clear string_buf;
                  string lexbuf;
                  Buffer.contents string_buf }

%parser

main : function_calls eof                                          
   {  }

function_calls:
  |                                                                
    { [] }
  | function_call ";" function_calls                               
    {  ::  }

function_call:
  | printf STRING                                                  
    { Print  } pr
  | "(" ANONYMFUNCTION lident "->" printf lident ")" STRING        
    { Print  } pf
  | nested_modules "." FUNCTION STRING                             
    { Function (, , ) } pf
  | FUNCTION STRING                                                
    { Function ([], , ) } pf
  | "(" ANONYMFUNCTION lident "->" FUNCTION lident ")" STRING      
    { Function ([], , ) } pf

printf:
  | FUNCTION<"printf">                                             
    { () }
  | MODULE<"Printf"> "." FUNCTION<"printf">                        
    { () }

nested_modules:
  | MODULE                                       
    { [] }
  | MODULE "." nested_modules                    
    {  ::  }

这个文件是最重要的。正如你所看到的，如果我有一个函数 printf "Test" 我的语法是模棱两可的，这可以减少到 Print "Test" 或 Function ([], "printf", "Test") 但是！，正如我意识到的那样，我可以优先考虑我的规则，所以如果一个优先级更高，它将被选为第一次解析。（尝试取消注释 let dyp_merge = Dyp.keep_all，您将看到所有可能的组合）。

在我的主要部分：

main.ml

open Parse_prog

let print_stlist fmt sl =
  match sl with 
    | [] -> ()
    | _ -> List.iter (Format.fprintf fmt "%s.") sl

let print_program tl =
  let aux1 t = match t with
      | Function (ml, f, p) -> 
        Format.printf "I can't do anything with %a%s(\"%s\")@." print_stlist ml f p
      | Print s -> Format.printf "You want to print : %s@." s
  in
  let aux2 t = List.iter (fun (tl, _) -> 
     List.iter aux1 tl; Format.eprintf "------------@.") tl in
  List.iter aux2 tl

let input_file = Sys.argv.(1)

let lexbuf = Dyp.from_channel (Forest_parser.pp ()) (Pervasives.open_in input_file)

let result = Parser_prog.main lexbuf

let () = print_program result

并且，例如，对于以下文件：

测试

printf "first print";
Printf.printf "nested print";
Format.eprintf "nothing possible";
(fun x -> printf x) "Anonymous print";

如果我执行./myexec test我会得到如下提示

You want to print : first print
You want to print : nested print
I can't do anything with Format.eprintf("nothing possible")
You want to print : x
------------

所以，TL;DR，这里的手动示例只是为了向您展示您可以使用您定义的令牌（我从未定义令牌 PRINT，只是 FUNCTION）和匹配它们以获得新规则。

我希望你说得清楚，我从你的问题中学到了很多东西 ;-)

[编辑] 所以，我更改了解析器以匹配您想要观看的内容：

{
      open Parse_prog

      (* let dyp_merge = Dyp.keep_all *)

      let string_buf = Buffer.create 10
    }

    %start main

    %relation pf<pp

    %lexer

    let newline = '\n'
    let space = [' ' '\t' '\r']
    let uident = ['A'-'Z']['a'-'z' 'A'-'Z' '0'-'9' '_']*
    let lident = ['a'-'z']['a'-'z' 'A'-'Z' '0'-'9' '_']*

    rule string = parse
      | '"' { () }
      | _ { Buffer.add_string string_buf (Dyp.lexeme lexbuf);
          string lexbuf }

    main lexer =
      newline | space + -> { () }
      "fun"  -> ANONYMFUNCTION { () }
      lident -> FUNCTION { Dyp.lexeme lexbuf }
      uident -> MODULE { Dyp.lexeme lexbuf }
      '"' -> STRING { Buffer.clear string_buf;
                      string lexbuf;
                      Buffer.contents string_buf }

    %parser

    main : function_calls eof                                          
       {  }

    function_calls:
      |                                                                
        { [] } pf
      | function_call <Function((["Printf"] | []), "printf", st)> ";" function_calls
        { (Print st) ::  } pp
      | function_call ";" function_calls                               
        {  ::  } pf


    function_call:
      | nested_modules "." FUNCTION STRING                          
        { Function (, , ) }
      | FUNCTION STRING                             
        { Function ([], , ) }
      | "(" ANONYMFUNCTION lident "->" FUNCTION lident ")" STRING
        { Function ([], , ) }

    nested_modules:
      | MODULE                                       
        { [] }
      | MODULE "." nested_modules                    
        {  ::  }

在这里，如您所见，我不处理我的函数在解析时打印但在我将其放入函数列表时打印的事实。因此，我匹配由我的解析器构建的 algebraic type 。我希望这个例子对你没问题 ;-)（但请注意，这是非常模棱两可的！:-D)

dypgen 中的模式匹配

Pattern Matching in dypgen

parsing

ocaml

ambiguity

lexical-analysis