序言:简单 lexer/2
Prolog: simple lexer/2
我需要一个小的lexer/2
in prolog,目前我有
tokens(Z) --> "while", tokens(Y), {Z = [ttwhile | Y]}.
tokens(Z) --> "do", tokens(Y), {Z = [ttdo | Y]}.
tokens(Z) --> "endwhile", tokens(Y), {Z = [ttendwhile | Y]}.
tokens(Z) --> "repeat", tokens(Y), {Z = [ttrepeat | Y]}.
tokens(Z) --> "until", tokens(Y), {Z = [ttuntil | Y]}.
tokens(Z) --> "endrepeat", tokens(Y), {Z = [ttendrepeat | Y]}.
tokens(Z) --> "if", tokens(Y), {Z = [ttif | Y]}.
tokens(Z) --> "then", tokens(Y), {Z = [ttthen | Y]}.
tokens(Z) --> "else", tokens(Y), {Z = [ttelse | Y]}.
tokens(Z) --> "endif", tokens(Y), {Z = [ttendif | Y]}.
tokens(Z) --> "exit", tokens(Y), {Z = [ttexit | Y]}.
tokens(Z) --> "other", tokens(Y), {Z = [ttother | Y]}.
% Comparison operators.
tokens(Z) --> "==", tokens(Y), {Z = [equal | Y]}.
tokens(Z) --> "<>", tokens(Y), {Z = [notequal | Y]}.
% Assignment operator.
tokens(Z) --> ":=", tokens(Y), {Z = [:= | Y]}.
% Boolean constants and operators.
tokens(Z) --> "true", tokens(Y), {Z = [true | Y]}.
tokens(Z) --> "false", tokens(Y), {Z = [false | Y]}.
tokens(Z) --> "and", tokens(Y), {Z = [and | Y]}.
tokens(Z) --> "or", tokens(Y), {Z = [or | Y]}.
tokens(Z) --> " ", tokens(Y), {Z = Y}.
tokens(Z) --> " ", tokens(Y), {Z = Y}.
tokens(Z) --> [C], tokens(Y), {name(X, [C]), Z = [X | Y]}.
tokens(Z) --> [], {Z = []}.
谁能帮我完成 lexer/2
的下一步,这样当我打电话
lexer([while,a,==,b,do,abc,endwhile], R)
,我可以得到 R = [ttwhile, a, equal, b, ttdo, abc, ttendwhile]
?
非常感谢。
下面的解决方案呢?
lexer(I, O) :-
tokens(O, I, []).
但是这样调用lexer()
lexer("while a == b do abc endwhile", R)
我加个建议:改写tokens()
这样
tokens([ttwhile | Z]) --> "while", tokens(Z).
tokens([ttdo | Z]) --> "do", tokens(Z).
tokens([endwhile | Z]) --> "endwhile", tokens(Z).
tokens([ttrepeat | Z]) --> "repeat", tokens(Z).
tokens([ttuntil | Z]) --> "until", tokens(Z).
tokens([ttendrepeat | Z]) --> "endrepeat", tokens(Z).
tokens([if | Z]) --> "if", tokens(Z).
tokens([then |Z]) --> "then", tokens(Z).
tokens([ttelse | Z]) --> "else", tokens(Z).
tokens([ttendif | Z]) --> "endif", tokens(Z).
tokens([ttexit | Z]) --> "exit", tokens(Z).
tokens([ttother | Z]) --> "other", tokens(Z).
% Comparison operators.
tokens([equal | Z]) --> "==", tokens(Z).
tokens([notequal | Z]) --> "<>", tokens(Z).
% Assignment operator.
tokens([:= | Z]) --> ":=", tokens(Z).
% Boolean constants and operators.
tokens([true | Z]) --> "true", tokens(Z).
tokens([false | Z]) --> "false", tokens(Z).
tokens([and | Z]) --> "and", tokens(Z).
tokens([or | Z]) --> "or", tokens(Z).
tokens(Z) --> " ", tokens(Z).
tokens([X | Z]) --> [C], tokens(Z), {name(X, [C])}.
tokens([]) --> [].
P.s.: 抱歉我的英语不好
嗯,这个 'glue' - 或多或少 - 解决了您的要求:
lexer(L, Tokens) :-
atomic_list_concat(L, ' ', A),
atom_codes(A, Cs),
phrase(tokens(Tokens), Cs).
?- lexer([while,a,==,b,do,abc,endwhile], R).
R = [ttwhile, a, equal, b, ttdo, a, b, c, ttendwhile] ;
R = [ttwhile, a, equal, b, ttdo, a, b, c, e|...] ;
但你真的应该以声明式的方式重写:
token(ttwhile) --> "while".
token(ttendwhile) --> "endwhile".
token(ttdo) --> "do".
%...
token(equal) --> "==".
token(notequal) --> "<>".
token(assign) --> ":=".
% this is wrong: symbols overlap with alphabetic tokens
token(N) --> [C], {atom_codes(N,[C])}.
tokens([]) --> [].
tokens(Ts) --> " ", tokens(Ts).
tokens([T|Ts]) --> token(T), tokens(Ts).
lexer(Cs, Tokens) :-
phrase(tokens(Tokens), Cs).
并调用传递代码列表,双引号(或反引号,如果您使用的是 SWI)字符串
?- lexer(`while abc endwhile`, R).
R = [ttwhile, a, b, c, ttendwhile] ;
R = [ttwhile, a, b, c, e, n, d, ttwhile] ;
...
编辑
要标记名称(好吧,为了简单起见,只有小写),将上面的 token(N) --> [C], {atom_codes(N,[C])}.
替换为
token(N) --> lower_case_chars(Cs), {Cs \= [], atom_codes(N,Cs)}.
lower_case_chars([C|Cs]) --> lower_case_char(C), lower_case_chars(Cs).
lower_case_chars([]) --> [].
lower_case_char(C) --> [C], {C>=0'a, C=<0'z}.
但它变得有点冗长,当您还添加 upper_case_chars、数字等时...值得概括,通过字符范围边界,或使用 code_type/2:
token(N) --> csymf(C), csyms(Cs), {atom_codes(N,[C|Cs])}.
csymf(C) --> [C], {code_type(C,csymf)}.
csyms([C|Cs]) --> [C], {code_type(C,csym)}, csyms(Cs).
csyms([]) --> [].
我需要一个小的lexer/2
in prolog,目前我有
tokens(Z) --> "while", tokens(Y), {Z = [ttwhile | Y]}.
tokens(Z) --> "do", tokens(Y), {Z = [ttdo | Y]}.
tokens(Z) --> "endwhile", tokens(Y), {Z = [ttendwhile | Y]}.
tokens(Z) --> "repeat", tokens(Y), {Z = [ttrepeat | Y]}.
tokens(Z) --> "until", tokens(Y), {Z = [ttuntil | Y]}.
tokens(Z) --> "endrepeat", tokens(Y), {Z = [ttendrepeat | Y]}.
tokens(Z) --> "if", tokens(Y), {Z = [ttif | Y]}.
tokens(Z) --> "then", tokens(Y), {Z = [ttthen | Y]}.
tokens(Z) --> "else", tokens(Y), {Z = [ttelse | Y]}.
tokens(Z) --> "endif", tokens(Y), {Z = [ttendif | Y]}.
tokens(Z) --> "exit", tokens(Y), {Z = [ttexit | Y]}.
tokens(Z) --> "other", tokens(Y), {Z = [ttother | Y]}.
% Comparison operators.
tokens(Z) --> "==", tokens(Y), {Z = [equal | Y]}.
tokens(Z) --> "<>", tokens(Y), {Z = [notequal | Y]}.
% Assignment operator.
tokens(Z) --> ":=", tokens(Y), {Z = [:= | Y]}.
% Boolean constants and operators.
tokens(Z) --> "true", tokens(Y), {Z = [true | Y]}.
tokens(Z) --> "false", tokens(Y), {Z = [false | Y]}.
tokens(Z) --> "and", tokens(Y), {Z = [and | Y]}.
tokens(Z) --> "or", tokens(Y), {Z = [or | Y]}.
tokens(Z) --> " ", tokens(Y), {Z = Y}.
tokens(Z) --> " ", tokens(Y), {Z = Y}.
tokens(Z) --> [C], tokens(Y), {name(X, [C]), Z = [X | Y]}.
tokens(Z) --> [], {Z = []}.
谁能帮我完成 lexer/2
的下一步,这样当我打电话
lexer([while,a,==,b,do,abc,endwhile], R)
,我可以得到 R = [ttwhile, a, equal, b, ttdo, abc, ttendwhile]
?
非常感谢。
下面的解决方案呢?
lexer(I, O) :-
tokens(O, I, []).
但是这样调用lexer()
lexer("while a == b do abc endwhile", R)
我加个建议:改写tokens()
这样
tokens([ttwhile | Z]) --> "while", tokens(Z).
tokens([ttdo | Z]) --> "do", tokens(Z).
tokens([endwhile | Z]) --> "endwhile", tokens(Z).
tokens([ttrepeat | Z]) --> "repeat", tokens(Z).
tokens([ttuntil | Z]) --> "until", tokens(Z).
tokens([ttendrepeat | Z]) --> "endrepeat", tokens(Z).
tokens([if | Z]) --> "if", tokens(Z).
tokens([then |Z]) --> "then", tokens(Z).
tokens([ttelse | Z]) --> "else", tokens(Z).
tokens([ttendif | Z]) --> "endif", tokens(Z).
tokens([ttexit | Z]) --> "exit", tokens(Z).
tokens([ttother | Z]) --> "other", tokens(Z).
% Comparison operators.
tokens([equal | Z]) --> "==", tokens(Z).
tokens([notequal | Z]) --> "<>", tokens(Z).
% Assignment operator.
tokens([:= | Z]) --> ":=", tokens(Z).
% Boolean constants and operators.
tokens([true | Z]) --> "true", tokens(Z).
tokens([false | Z]) --> "false", tokens(Z).
tokens([and | Z]) --> "and", tokens(Z).
tokens([or | Z]) --> "or", tokens(Z).
tokens(Z) --> " ", tokens(Z).
tokens([X | Z]) --> [C], tokens(Z), {name(X, [C])}.
tokens([]) --> [].
P.s.: 抱歉我的英语不好
嗯,这个 'glue' - 或多或少 - 解决了您的要求:
lexer(L, Tokens) :-
atomic_list_concat(L, ' ', A),
atom_codes(A, Cs),
phrase(tokens(Tokens), Cs).
?- lexer([while,a,==,b,do,abc,endwhile], R).
R = [ttwhile, a, equal, b, ttdo, a, b, c, ttendwhile] ;
R = [ttwhile, a, equal, b, ttdo, a, b, c, e|...] ;
但你真的应该以声明式的方式重写:
token(ttwhile) --> "while".
token(ttendwhile) --> "endwhile".
token(ttdo) --> "do".
%...
token(equal) --> "==".
token(notequal) --> "<>".
token(assign) --> ":=".
% this is wrong: symbols overlap with alphabetic tokens
token(N) --> [C], {atom_codes(N,[C])}.
tokens([]) --> [].
tokens(Ts) --> " ", tokens(Ts).
tokens([T|Ts]) --> token(T), tokens(Ts).
lexer(Cs, Tokens) :-
phrase(tokens(Tokens), Cs).
并调用传递代码列表,双引号(或反引号,如果您使用的是 SWI)字符串
?- lexer(`while abc endwhile`, R).
R = [ttwhile, a, b, c, ttendwhile] ;
R = [ttwhile, a, b, c, e, n, d, ttwhile] ;
...
编辑
要标记名称(好吧,为了简单起见,只有小写),将上面的 token(N) --> [C], {atom_codes(N,[C])}.
替换为
token(N) --> lower_case_chars(Cs), {Cs \= [], atom_codes(N,Cs)}.
lower_case_chars([C|Cs]) --> lower_case_char(C), lower_case_chars(Cs).
lower_case_chars([]) --> [].
lower_case_char(C) --> [C], {C>=0'a, C=<0'z}.
但它变得有点冗长,当您还添加 upper_case_chars、数字等时...值得概括,通过字符范围边界,或使用 code_type/2:
token(N) --> csymf(C), csyms(Cs), {atom_codes(N,[C|Cs])}.
csymf(C) --> [C], {code_type(C,csymf)}.
csyms([C|Cs]) --> [C], {code_type(C,csym)}, csyms(Cs).
csyms([]) --> [].