从给定文本中删除单行和多行注释的 Lex 程序

Question

我正在尝试编写一个 lex 程序，它将删除单行注释和多行注释。

%{
#include<stdio.h>
int single=0;
int multi=0;    
%}
%%
"//"([a-z]|[A-Z]|[0-9]|" ")* {++single;}
"/*"(.*\n)* "*/" {++multi;}
%%
int main(int argc, int **argv)
{
    yyin=fopen("abc.txt","r");
    yylex();
    printf("no of single line comment = %d ", single);
    printf("no of multi line comment = %d ", multi);
    return 0;
}

此程序无法删除多行注释。

Answer 1

如果您的 abc.txt 文件中有多个多行注释，那么您的多行注释模式将匹配第一个多行注释开头和最后一个多行注释结尾之间的所有内容。发生这种情况是因为 lex 表现出贪婪行为，并将尝试匹配输入字符串的最长前缀。并且您的多行注释模式允许 /* 和 */ 与 (.*\n)*

匹配

此外，您的代码不会检测包含除字母数字字符和 space（例如 -、;: 等...）以外的任何字符的单行注释。

将您的模式操作更改为这些，它应该会实现您的 objective。

"//".*\n            { ++single; }
"/*"[^*/]*"*/"      { ++multi; }

虽然上面的代码仍然会留下一些新行来代替删除的多行注释。这有点棘手，我无法找到删除这些新行的快速解决方案。

希望对您有所帮助！

Answer 2

对于弹性，

"//".* {singleLine++;}
"/*"([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+\/ {multiLine++;}

详细信息：https://blog.ostermiller.org/finding-comments-in-source-code-using-regular-expressions/

从给定文本中删除单行和多行注释的 Lex 程序

Lex program to remove single line and multi-line comment from a given text

regex

yacc

lex

text-parsing

lexical-analysis