在格式化程序中存储位置的常用方法

Question

我想写一个特定语言的小编辑器。在编辑器中，我们将能够 indent 一行或几行（即，在每行的左侧添加空格）；我们还将能够 format 整个代码（即，更改适当位置的空格和换行符）。

给定一个程序，我的 ocamllex 和 ocamlyacc 的前端可以构建一个 Abstract Syntax Tree (AST)。我想知道在 AST 中存储元素位置的常用方法有哪些。

我猜想的一种方法是将 (start) position 附加到 AST 的每个元素。例如，如果表达式的类型定义如下：

type expression =
  ...
  | E_int of int
  | E_function_EEs of Function.t * (expression list)

会变成：

type expression =
  ...
  | E_int of position * int
  | E_function_EEs of position * Function.t * (expression list)

然后，如果我们知道每个元素的长度，我们就可以推断出所有内容在编辑器中的位置。这是一种常见的方法吗？我觉得不好看...

Answer 1

您不必为每个模式重复 position。你可以在最后写一个：

type expression = 
  ...
  | E_int of int
  | E_function_EEs of Function.t * (expression list)
  | E_loc of position * expression

因此，对于超过 expression 的现有功能，您只需为 E_loc 添加一个案例，而无需触及现有案例。

要在解析时自动构造E_loc，您可以添加.mly，例如：

loc(EXPRESSION):
| t = EXPRESSION { E_loc (($startpos, $endpos), t) }

(* immediate construction: *)
expression:
| INT { E_loc (($startpos, $endpos), E_int ) }

(* or delay construction: *)
expression:
| e0 = loc(expression) PLUS e1 = loc(expression) { E_function_EEs (Function.PLUS, [e0; e1]) }

在格式化程序中存储位置的常用方法

Common way to store positions in a formatter

frontend

lex

code-formatting

pretty-print

ocamllex