OCamlLex 不区分大小写

OCamlLex case-insenstitive

有没有办法在 Ocamllex 规范中使用不区分大小写的标记? 我已经尝试以这种方式制作不区分大小写的令牌:

let token = parser
    ...
   | ['C''c']['A''a']['S''s']['E''e'] { CASE }
    ...

但我正在搜索其他内容(如果存在的话)。

使用接受小写和大写的普通标记词法分析器,并在 table 中查找关键字,忽略大小写:

{
type token = Case | Test | Ident of string

let keyword_tbl = Hashtbl.create 64

let _ = List.iter (fun (name, keyword) ->
    Hashtbl.add keyword_tbl name keyword) [
    "case", Case;
    "test", Test;
  ]
}

let ident_char = ['a'-'z' 'A'-'Z' '_']

rule next_token = parse
  | ident_char+ as s {
      let canon = String.lowercase s in
      try Hashtbl.find keyword_tbl canon
      with Not_found ->
        (* `Ident canon` if you want case-insensitive vars as well
         * as keywords *)
        Ident s
    }

正如@gsg 所建议的,解决这个问题的正确方法是使用一个接受大小写的普通令牌词法分析器,然后在 table 中查找关键字。为每个关键字设置正则表达式实际上是 ocamllex:

文档中提到的反模式

12.7 Common errors

ocamllex: transition table overflow, automaton is too big The deterministic automata generated by ocamllex are limited to at most 32767 transitions. The message above indicates that your lexer definition is too complex and overflows this limit. This is commonly caused by lexer definitions that have separate rules for each of the alphabetic keywords of the language, as in the following example. […]

http://caml.inria.fr/pub/docs/manual-ocaml-4.00/manual026.html#toc111

文档继续使用显式代码示例和使用查找重写 table。

lookup-table 应该封装查找的“不区分大小写”,因此我们应该使用函数关联结构(阅读:MapHashTbl) 允许我们定义我们自己的比较函数。由于我们的 lookup-table 很可能是 immutable,我们选择 Map:

{
type token = Case | Test | Ident of string

module KeywordTable =
  Map.Make(struct
    type t = string
    let compare a b =
      String.(compare (lowercase a) (lowercase b))
  end)

let keyword_table =
  List.fold_left
    (fun (k, v) -> KeywordTable.add k v))
    [
      "case", Case;
      "test", Test;
    ]
    KeywordTable.empty
}

let ident_char = ['a'-'z' 'A'-'Z' '_']

rule next_token = parse
  | ident_char+ as s {
      try KeywordTable.find keyword_table s
      with Not_found -> Ident s
    }