Lexing VHDL' (tick) 令牌

Question

在VHDL中，'字符可以用来封装字符标记ie '.'或者它可以作为属性分隔符（类似于CPP的::标记）ie string'("hello").

解析包含字符 ie string'('a','b','c') 的属性名称时出现问题。在这种情况下，天真的词法分析器会错误地将第一个 '(' 标记为一个字符，并且接下来的所有实际字符都会被弄乱。

2007 年 comp.lang.vhdl google 组中有一个线程提出了类似的问题标题为 "Lexing the ' char"，用户 diogratia

给出了答案

        case '\'':                          /* IR1045 check */

            if (    last_token == DELIM_RIGHT_PAREN ||
                    last_token == DELIM_RIGHT_BRACKET ||
                    last_token == KEYWD_ALL ||
                    last_token == IDENTIFIER_TOKEN ||
                    last_token == STR_LIT_TOKEN ||
                    last_token == CHAR_LIT_TOKEN || ! (buff_ptr<BUFSIZ-2) )
                token_flag = DELIM_APOSTROPHE;
            else if (is_graphic_char(NEXT_CHAR) &&
                    line_buff[buff_ptr+2] == '\'') { CHARACTER_LITERAL:
                buff_ptr+= 3;               /* lead,trailing \' and char */
                last_token = CHAR_LIT_TOKEN;
                token_strlen = 3;
                return (last_token);
            }
            else token_flag = DELIM_APOSTROPHE;
            break;

See Issue Report IR1045: http://www.eda-twiki.org/isac/IRs-VHDL-93/IR1045.txt

As you can see from the above code fragment, the last token can be captured and used to di"sambiguate something like:

  foo <= std_logic_vector'('a','b','c');

without a large look ahead or backtracking.

但是，据我所知，flex 不会跟踪最后解析的标记。

无需手动跟踪最后解析的标记，是否有更好的方法来完成此词法分析任务？

如果有帮助，我正在使用 IntelliJ GrammarKit。

Answer 1

IR1045 背后的想法是能够判断单个 quote/apostrophe 是否是字符文字的一部分，而无需向前看或在错误时回溯，尝试：

library ieee;
use ieee.std_logic_1164.all;

entity foo is
    port (
        a:      in      std_logic;
        b:      out     std_logic_vector (3 downto 0)
    );
end entity;

architecture behave of foo is
    begin
    b <= std_logic_vector'('0','1','1','0')     when a = '1' else
         (others =>'0')                         when a = '0' else
         (others => 'X');
end architecture behave;

你愿意看多远？

然而，有一个灵活消歧 apostrophes 和 VHDL 字符文字的实际示例。

Nick Gasson 的 nvc 使用 flex，他在其中实施了 Issue Report 1045 解决方案。

参见nvc/src/lexer.l，它是在 GPLv3 下获得许可的。

搜索 last_token:

#define TOKEN(t) return (last_token = (t))

和

#define TOKEN_LRM(t, lrm)                                       \
   if (standard() < lrm) {                                      \
      warn_at(&yylloc, "%s is a reserved word in VHDL-%s",      \
              yytext, standard_text(lrm));                      \
      return parse_id(yytext);                                  \
   }                                                            \
   else                                                         \
      return (last_token = (t));

增加了检查它的功能：

static int resolve_ir1045(void);

static int last_token = -1;

即：

%%

static int resolve_ir1045(void)
{
   // See here for discussion:
   //   http://www.eda-stds.org/isac/IRs-VHDL-93/IR1045.txt
   // The set of tokens that may precede a character literal is
   // disjoint from that which may precede a single tick token.

   switch (last_token) {
   case tRSQUARE:
   case tRPAREN:
   case tALL:
   case tID:
      // Cannot be a character literal
      return 0;
   default:
      return 1;
   }
}

自 comp.lang.vhdl post 以来，IR1045 位置发生了变化，现在

http://www.eda-twiki.org/isac/IRs-VHDL-93/IR1045.txt

您还需要在 lexer.l 中搜索 resolve_ir1045。

static int resolve_ir1045(void);

和

{CHAR}            { if (resolve_ir1045()) {
                       yylval.s = strdup(yytext);
                       TOKEN(tID);

我们发现 nvc 使用函数来过滤检测字符文字的第一个单引号。

这最初是一个 Ada 问题。 IR-1045 从未被采用但被普遍使用。可能还有 Ada flex 词法分析器也可以消除歧义。

消除歧义的要求在 2006 年 9 月的 Ada 用户日志第 27 number 3 卷中的一篇文章 词法分析 PDF 第 30 和 31 页（第 27 卷第 159 页）中进行了讨论和 160) 我们看到的解决方案并不为人所知。

关于字符文字不在单引号之前的评论是不准确的：

entity ir1045 is
end entity;

architecture foo of ir1045 is
begin
THIS_PROCESS:
    process
        type twovalue is ('0', '1');  
        subtype string4 is string(1 to 4);
        attribute a: string4;
        attribute a of '1' : literal is "TRUE";
    begin
        assert THIS_PROCESS.'1''a /= "TRUE"
            report "'1''a /= ""TRUE"" is FALSE";
        report "This_PROCESS.'1''a'RIGHT = " &
            integer'image(This_PROCESS.'1''a'RIGHT);
        wait;
    end process;
end architecture;

第一次使用具有选定名称前缀且后缀为字符文字的属性表明不准确，第二个报告语句表明它可能很重要：

ghdl -a ir1045.vhdl
ghdl -e ir1045
ghdl -r ir1045
ir1045.vhdl:13:9:@0ms:(assertion error): '1''a /= "TRUE" is FALSE
ir1045.vhdl:15:9:@0ms:(report note): This_PROCESS.'1''a'RIGHT = 4

除了包含带有字符文字后缀的选定名称的属性名称前缀之外，还要求属性规范 'decorate' 声明的实体（属于 entity_class，请参阅 IEEE 标准 1076- 2008 7.2 属性规范）在声明实体的同一声明区域中。

这个例子在句法和语义上都是有效的 VHDL。您可能会注意到 nvc 不允许使用实体 class 文字装饰命名实体。这不是根据 7.2.

枚举文字在类型声明中声明，这里是双值类型。至少有一个字符字面量作为枚举字面量的枚举类型是字符类型(5.2.2.1).

Lexing VHDL' (tick) 令牌

Lexing The VHDL ' (tick) Token

vhdl

flex-lexer

grammar-kit