用外语定义标识符

Define identifiers in foreign languages

目前,在我的 tokens.mll 中,我定义了以下标记来构建 latin_identifier

let decimal_digit = ['0'-'9']
let first_latin_identifier_character = ['a'-'z' 'A'-'Z']
let subsequent_latin_identifier_character = first_latin_identifier_character | '\x5F' (* underscore *) | decimal_digit
let latin_identifier = first_latin_identifier_character subsequent_latin_identifier_character*

但是,此设置不包括 ZÄHLENWENNSSENÃODISPTipoDeAusência_Férias.

等标识符

有谁知道如何使标识符覆盖西班牙语、法语、德语甚至中文?

一个选择是切换到 sedlex https://github.com/ocaml-community/sedlex 它内置了对 unicode 类 代码点的支持(特别是 id_startid_continue)。