在序言中构建解析器

Question

我正在尝试构建一个解析器，但我似乎无法理解它是如何工作的。我需要有人帮助我指出正确的方向，以便我可以从那里拿起它。

所以我有一个分词器和一个词法分析器。

分词器输出：

[int,add,(,int,a,int,b,),=,a,+,b,int,letin,(,int,a,),=,let,b, =,10,in,add,(,a,b,),int,equal,(,int,a,int,b,),=,if,a,==,b,then,letin ,(,a,),else,1,int,main,(,int,input,),=,equal,(,input,2,)]

词法分析器输出：

[TYPE_INT,IDENTIFIER,OPEN_P,TYPE_INT,IDENTIFIER,COMMA,TYPE_INT,IDENTIFIER,CLOSE_P,ASSIGN,IDENTIFIER,ARITH_ADD,IDENTIFIER,TYPE_INT,IDENTIFIER,OPEN_P,TYPE_INT,IDENTIFIER,CLOSE_P,ASSIGN,LET,IDENTIFIER,ASSIGN,IDENTIFIER,LET_IN,标识符,OPEN_P,标识符,逗号,标识符,CLOSE_P,TYPE_INT,标识符,OPEN_P,TYPE_INT,标识符,逗号,TYPE_INT,标识符,CLOSE_P,分配,COND_IF,标识符,LOGIC_EQ,标识符,COND_THEN,标识符,OPEN_P,标识符,CLOSE_P,COND_ELSE,INTEGER,TYPE_INT,IDENTIFIER,OPEN_P,TYPE_INT,IDENTIFIER,CLOSE_P,ASSIGN,IDENTIFIER,OPEN_P,IDENTIFIER,COMMA,INTEGER, CLOSE_P]

现在我必须构建一个解析器。我不知道如何开始以及如何包含参数化结构。

我的规则是这样的。

program --> functionList.
functionList --> function,functionListCollection.
functionListCollection --> functionList | [].
function --> typeID(typeIdList),[=],expression.
typeID --> [int],[id] | [bool],[id].
typeIdList --> typeID,typeIdListCollection.
typeIdListCollection --> [,], typeIdList | [].
expression --> [if], comparison, [then], value, [else], value.
expression --> [let],[id],[=], value, [in], expression.
expression --> value, extraExpression.
extraExpression --> arithmetic | [].
arithmetic --> [+], value | [-], value.
comparison --> value, comparisonRight.
comparisonRight --> [=],[=],value.
comparisonRight --> [!], [=], value.
comparisonRight --> [>], value.
comparisonRight --> [>], [=], value.
value --> [number].
value --> [id], valueParameters.
valueParameters --> [(],parameters,[)]. | [].
parameters --> value, parametersList.
parametersList --> [,], parameters | [].

我正在寻找构建一个谓词，它采用词法列表并从解析器中给出列表。然后，我将通过查看令牌列表来替换数字和标识符。一些关于如何开始的帮助将不胜感激。谢谢。

Answer 1

让你的词法分析器 (*) 为你输出，而不是数据

[type(int), id(add), open_p, type(int), id(a), .....

所以你的规则中有完整的信息，比如

....
typeID(T,A) --> [type(T)], { memberchk(T, [int,bool]) }, [id(A)],
....

测试：

32 ?- phrase( typeID(X,Y), [type(int), id(i)], []).
X = int,
Y = i.

33 ?- phrase( typeID(X,Y), [type(bool), id(i)], []).
X = bool,
Y = i.

34 ?- phrase( typeID(X,Y), [type(float), id(i)], []).
false.

(*) 或者您可以将您的标记和您当前的词法分析器输出组合起来以获得组合数据，maplist.

Answer 2

你的 'Lexer Output' 看起来不仅没用，而且实际上很危险。它（任意？）重命名已经是解析器一部分的重要符号（如 int）（顺便说一句，分词器和词法分析器是 synonymsafaik）。因此，只需为您的 DCG（纠正后，见下文）提供分词器输出：

tokens_test :-
    Tokens = [
        int,add,=,a,+,33
        % ...more source code to test - but please use writeq to show it  ...
        %,int,let,in,'(',int,a,')',=,let,b,=,10
        %,in,add,'(',a,',',b,')',int,equal,'(',int,a,',',int,b,')',=
        %,if,a,==,b,then,let,in,'(',a,')',else,1
        %,int,main,'(',int,input,')',=,equal,'(',input,',',2,')'
    ], phrase(program, Tokens).

?- tokens_test.
true

我更改了你的 DCG，因为你有 [id] 和 [number] 作为终端：

id --> [I], {atom(I)}.
number --> [N], {number(N)}.

program --> functionList.
functionList --> function,functionListCollection.
functionListCollection --> functionList | [].
function --> typeID,[=],expression.
typeID --> [int],id | [bool],id.
typeIdList --> typeID,typeIdListCollection.
typeIdListCollection --> [,], typeIdList | [].
expression --> [if], comparison, [then], value, [else], value.
expression --> [let], id, [=], value, [in], expression.
expression --> value, extraExpression.
extraExpression --> arithmetic | [].
arithmetic --> [+], value | [-], value.
comparison --> value, comparisonRight.
comparisonRight --> [=],[=],value.
comparisonRight --> [!], [=], value.
comparisonRight --> [>], value.
comparisonRight --> [>], [=], value.
value --> number.
value --> id, valueParameters.
valueParameters --> ['('],parameters,[')'] | [].
parameters --> value, parametersList.
parametersList --> [,], parameters | [].

如果你有分隔符（下面的逗号），你可以避免像

这样的服务谓词

typeIdList --> typeID,typeIdListCollection.
typeIdListCollection --> [,], typeIdList | [].

可能是

typeIdList --> typeID, ( [,], typeIdList | [] ).

但此 'simplification' 的便利性可能取决于其他因素。

how to include parameterized constructs

id//0 和 number//0 是非终结符，应该 'get back' 它们的值：它们变成 id//1 和 number//1，按照惯例，我们曾经 'tag'带有非终结符的值（所以我们保留符号 :-)

id(id(I)) --> [I],   % better to filter out keywords here...
  {atom(I), \+memberchk(I,[let, etc etc])}.
number(number(N,integer)) --> [N], {integer(N)}.
number(number(N,float)) --> [N], {float(N)}.

现在叫他们的地方，我们必须改成

...
typeID --> [int],id(_) | [bool],id(_).
...

很明显 typeID//0 现在应该 'get back' 值（所谓的语义属性），否则它们将丢失：

...
typeID(typeID(T,I)) --> [int],id(I),{T=int} | [bool],id(I),{T=bool}.
...

这显示了编写 typeID//1 的更好方法。可能是

typeID(typeID(T,I)) --> type(T),id(I).

type(int) --> [int].
type(bool) --> [bool].

希望你能拍到照片:-)

在序言中构建解析器

Building a parser in prolog

prolog

dcg