逐行读取 C 文件中的注释

Question

我想对与 C/C++ 注释和预处理器指令兼容的汇编文件进行一些重构。

不幸的是，我不能使用重构工具，如 Astyle。我必须手动解析我的文件。

我的重构算法迭代文件的每一行，如下所示：

while(<FH>)
{
   next if isComment($_);

   $count += s/$search/$replace/;   # A refactoring rule
   $count += arithmetic($_);        # R1=R2+3*4;  --> r1 = r2 + 3 * 4;
   ...
   $out .= $_;
}

if($count)
{
   open my $fh ">$filename";
   print $fh $out;
   close $fh;
}

用这种方法我无法准确地检测到注释行。所以我实现了对每个 /* 计数并在每个 */ 上减少的计数器。如果计数器大于 0，我将忽略该行。

不幸的是，这种方法在这种情况下不起作用：

/*     /* <-- Not allowed      */ /*    */

计数器将等于 1，而它应该等于 0。

所以我正在寻找一种准确的方法来检测评论块并忽略它们。有什么包或模块可以帮助我吗？

Answer 1

您必须更详细地解析代码，因为注释字符可能位于字符串或 #ifdef.

中

也许您应该运行一个预处理器来为您准备代码。对于 GCC 预处理器，请查看 How do I run the GCC preprocessor to get the code after macros like #define are expanded? .

您可能希望将预处理后的代码输出到 stdout 并在您的 perl 代码中打开一个管道。

要做到完全正确，您还必须解析所有包含文件。想象以下（非常糟糕，但有效）代码：

inc1.h

/*

inc2.h

*/

main.c

#include <stdio.h>

int main() {
    #include "inc1.h"
    printf("Ha!\n");
    #include "inc2.h"
}

Answer 2

最终我发现这个解决方案效果很好。我全局识别所有评论块并用标记 /*@@n@@*/ 替换它们，其中 n 是一个数字。

处理完成后，我可以恢复原来的评论。

#!/usr/bin/env perl
use 5.014;
use strict;
use warnings;

# C/C++ Comment detection
my $re = qr{(
   /\*         ##  Start of /* ... */ comment
   [^*]*\*+    ##  Non-* followed by 1-or-more *'s
   (?:
     [^/*][^*]*\*+
   )*
   /
   |//.*      ##  // ... comment
   |"[^"]*"   ##  A souble quoted string
   |'[^"]*'   ##  A simple quoted string
   )}mx;

my $i = 0;
my @comments = ();

while(<$fh>) {
    return unless -f;
    my $filename = $_;

    # Read whole file
    open my $fh, '<', $filename or die "Unable to open $filename";
    $_ = do {local $/ = undef; <$fh>};

    # Store C/C++ comments and replace them with markers
    $i = 0;
    @comments = ();
    s|$re|&store()|eg;

    # Do the processing
    ...

    # Restore C comments
    $i = 0;
    for my $comment (@comments) {
       my $s = quotemeta("/*@@".$i++."@@*/");
       s|$s|$comment|g;
    }
}

sub store {
    my $marker = "/*@@".$i."@@*/";
    $comments[$i] = shift;
    $i++;
    $marker;
}

逐行读取 C 文件中的注释

Strip comments in a C file while reading it line per line

perl

inc1.h

inc2.h

main.c