Str.global_replace 在 OCaml 中将克拉放在不应该的地方

Question

我正在努力将多行字符串转换为标记列表，这样我可能更容易使用。

根据我的项目的具体需要，我用空格填充出现在我输入中的任何克拉符号，以便 "^" 变成 " ^ "。我正在使用类似以下功能的功能来执行此操作：

let bad_function string = Str.global_replace (Str.regexp "^") " ^ " (string)

然后我使用类似于下面的函数将此多行字符串转换为标记列表（忽略空格）。

let string_to_tokens string = (Str.split (Str.regexp "[ \n\r\x0c\t]+") (string));;

出于某种原因，bad_function 将克拉添加到不应添加的地方。取下面一行代码：

(bad_function " This is some 
            multiline input 
            with newline characters 
            and tabs. When I convert this string
            into a list of tokens I get ^s showing up where 
            they shouldn't. ")

第一行字符串变成：

^  This is some \n ^

当我将 bad_function 的输出输入 string_to_tokens 时，我得到以下列表：

string_to_tokens (bad_function " This is some 
            multiline input 
            with newline characters 
            and tabs. When I convert this string
            into a list of tokens I get ^s showing up where 
            they shouldn't. ")

["^"; "This"; "is"; "some"; "^"; "multiline"; "input"; "^"; "with";
 "newline"; "characters"; "^"; "and"; "tabs."; "When"; "I"; "convert";
 "this"; "string"; "^"; "into"; "a"; "list"; "of"; "tokens"; "I"; "get";
 "^s"; "showing"; "up"; "where"; "^"; "they"; "shouldn't."]

为什么会发生这种情况，我该如何解决才能让这些函数按照我的意愿运行？

Answer 1

如 Str 模块中所述。

^ Matches at beginning of line: either at the beginning of the matched string, or just after a '\n' character.

因此您必须使用转义字符“\”来引用“^”字符。但是，请注意（也来自文档）

any backslash character in the regular expression must be doubled to make it past the OCaml string parser.

这意味着您必须输入双“\”才能在不收到警告的情况下执行您想要的操作。

这应该可以完成工作：

let bad_function string = Str.global_replace (Str.regexp "\^") " ^ " (string);;

Str.global_replace 在 OCaml 中将克拉放在不应该的地方

Str.global_replace in OCaml putting carats where they shouldn't be

regex

string

ocaml