OCamlLex 不区分大小写
OCamlLex case-insenstitive
有没有办法在 Ocamllex 规范中使用不区分大小写的标记?
我已经尝试以这种方式制作不区分大小写的令牌:
let token = parser
...
| ['C''c']['A''a']['S''s']['E''e'] { CASE }
...
但我正在搜索其他内容(如果存在的话)。
使用接受小写和大写的普通标记词法分析器,并在 table 中查找关键字,忽略大小写:
{
type token = Case | Test | Ident of string
let keyword_tbl = Hashtbl.create 64
let _ = List.iter (fun (name, keyword) ->
Hashtbl.add keyword_tbl name keyword) [
"case", Case;
"test", Test;
]
}
let ident_char = ['a'-'z' 'A'-'Z' '_']
rule next_token = parse
| ident_char+ as s {
let canon = String.lowercase s in
try Hashtbl.find keyword_tbl canon
with Not_found ->
(* `Ident canon` if you want case-insensitive vars as well
* as keywords *)
Ident s
}
正如@gsg 所建议的,解决这个问题的正确方法是使用一个接受大小写的普通令牌词法分析器,然后在 table 中查找关键字。为每个关键字设置正则表达式实际上是 ocamllex:
文档中提到的反模式
12.7 Common errors
ocamllex: transition table overflow, automaton is too big
The deterministic automata generated by ocamllex are limited to at most 32767 transitions. The message above indicates that your lexer definition is too complex and overflows this limit. This is commonly caused by lexer definitions that have separate rules for each of the alphabetic keywords of the language, as in the following example. […]
http://caml.inria.fr/pub/docs/manual-ocaml-4.00/manual026.html#toc111
文档继续使用显式代码示例和使用查找重写 table。
lookup-table 应该封装查找的“不区分大小写”,因此我们应该使用函数关联结构(阅读:Map 或 HashTbl) 允许我们定义我们自己的比较函数。由于我们的 lookup-table 很可能是 immutable,我们选择 Map:
{
type token = Case | Test | Ident of string
module KeywordTable =
Map.Make(struct
type t = string
let compare a b =
String.(compare (lowercase a) (lowercase b))
end)
let keyword_table =
List.fold_left
(fun (k, v) -> KeywordTable.add k v))
[
"case", Case;
"test", Test;
]
KeywordTable.empty
}
let ident_char = ['a'-'z' 'A'-'Z' '_']
rule next_token = parse
| ident_char+ as s {
try KeywordTable.find keyword_table s
with Not_found -> Ident s
}
有没有办法在 Ocamllex 规范中使用不区分大小写的标记? 我已经尝试以这种方式制作不区分大小写的令牌:
let token = parser
...
| ['C''c']['A''a']['S''s']['E''e'] { CASE }
...
但我正在搜索其他内容(如果存在的话)。
使用接受小写和大写的普通标记词法分析器,并在 table 中查找关键字,忽略大小写:
{
type token = Case | Test | Ident of string
let keyword_tbl = Hashtbl.create 64
let _ = List.iter (fun (name, keyword) ->
Hashtbl.add keyword_tbl name keyword) [
"case", Case;
"test", Test;
]
}
let ident_char = ['a'-'z' 'A'-'Z' '_']
rule next_token = parse
| ident_char+ as s {
let canon = String.lowercase s in
try Hashtbl.find keyword_tbl canon
with Not_found ->
(* `Ident canon` if you want case-insensitive vars as well
* as keywords *)
Ident s
}
正如@gsg 所建议的,解决这个问题的正确方法是使用一个接受大小写的普通令牌词法分析器,然后在 table 中查找关键字。为每个关键字设置正则表达式实际上是 ocamllex:
文档中提到的反模式12.7 Common errors
ocamllex: transition table overflow, automaton is too big The deterministic automata generated by ocamllex are limited to at most 32767 transitions. The message above indicates that your lexer definition is too complex and overflows this limit. This is commonly caused by lexer definitions that have separate rules for each of the alphabetic keywords of the language, as in the following example. […]
http://caml.inria.fr/pub/docs/manual-ocaml-4.00/manual026.html#toc111
文档继续使用显式代码示例和使用查找重写 table。
lookup-table 应该封装查找的“不区分大小写”,因此我们应该使用函数关联结构(阅读:Map 或 HashTbl) 允许我们定义我们自己的比较函数。由于我们的 lookup-table 很可能是 immutable,我们选择 Map:
{
type token = Case | Test | Ident of string
module KeywordTable =
Map.Make(struct
type t = string
let compare a b =
String.(compare (lowercase a) (lowercase b))
end)
let keyword_table =
List.fold_left
(fun (k, v) -> KeywordTable.add k v))
[
"case", Case;
"test", Test;
]
KeywordTable.empty
}
let ident_char = ['a'-'z' 'A'-'Z' '_']
rule next_token = parse
| ident_char+ as s {
try KeywordTable.find keyword_table s
with Not_found -> Ident s
}