Python 中的微型 DSL 实现

Tiny DSL implementation in Python

我正在研究在 Python 中实现 DSL,我正在寻找一种小型 DSL 语言,它对没有设计和实现语言经验的人来说很友好。到目前为止,我回顾了两个实现,即 Hy 和 Mochi。 Hy 实际上是 lisp 的一种方言,而 Mochi 似乎与 Elixir 非常相似。两者对我来说都很复杂,因为我现在的目标是对语言进行原型设计并使用 in 来确定它是否真的有助于解决问题以及是否适合问题所需的风格。我知道 Python 通过标准库中提供的语言工具得到了很好的支持。到目前为止,我实现了一种确实非常简单的 lisp 方言,我没有使用任何 python AST,它纯粹是通过字符串处理实现的,这对于我正在寻找的东西来说绝对不灵活。

除了上面提到的两种语言,是否有任何实现,小到可以研究?

有哪些关于这个主题的好书(实用性,不仅仅停留在理论和学术方面)?

学习和使用 Python AST 的好方法是什么?

就实际生成的字节码的开销而言,是否存在与基于 Python(如 Hy)构建的语言相关的任何重大性能问题?

谢谢

You can split the task of creating a (yet another!) new language in at least two big steps:

  • Syntax
  • Semantics & Interpretation

Syntax

You need to define a grammar for your language, with production rules that specify how to create complex expressions from simple ones.

Example: syntax for LISP:

expression ::= atom   | list
atom       ::= number | symbol    
number     ::= [+-]?['0'-'9']+
symbol     ::= ['A'-'Z''a'-'z'].*
list       ::= '(' expression* ')'

How to read it: an expression is either an atom or a list; an atom is a number or a symbol; a number is... and so on.

Often you will define also some tokenization rules, because most grammars work at token level, and not at characters level.

Once you defined your grammar, you want a parser that, given a sentence (a program) is able to build the derivation tree, or the abstract syntax tree.

For example, for the expression x=f(y+1)+2, you want to obtain the tree:

There are several parsers (LL, LR, recursive descent, ...). You don't necessarily need to write your language parser by yourself, as there are tools that generate the parser from the grammar specification (LEX & YACC, Flex & Bison, JavaCC, ANTLR; also check this list of parsers available for Python).

If you want to skip the step of designing a new grammar, you may want to start from a simple one, like the grammar of LISP. There is even a LISP parser written in Python in the Pyperplan project. They use it for parsing PDDL, which is a domain specific language for planning that is based on LISP.

Useful readings:

Semantics & Interpretation

Once you have the abstract syntax tree of your program, you want to execute your program. There are several formalisms for specifying the "rules" to execute (pieces of) programs:

  • Operational semantics: a very popular one. It is classified in two categories:
    • Small Step Semantics: describe individual steps of computation
    • Big Step Semantics: describe the overall results of computation
  • Reduction semantics: a formalism based on lambda calculus
  • Transition semantics: if you look at your interpreter like a transition system, you can specify its semantics using transition semantics. This is especially useful for programs that do not terminate (i.e. 运行 continuously), like controllers.

Useful readings:

你真的不需要了解很多关于解析的知识来编写你自己的语言。

我写了一个库,可以让你很容易地做到这一点:https://github.com/erezsh/lark

这是我的博客post,解释了如何使用它来编写您自己的语言:http://blog.erezsh.com/how-to-write-a-dsl-in-python-with-lark/

我希望你不要介意我无耻的插话,但这似乎与你的问题很相关。