Perl：Regex 不抓取代码中的多行 C 风格注释

Question

我有一个 Perl 程序：

读取用 C 编写的 SRC 文件
使用 SRC 文件中的正则表达式匹配来查找特定格式的数据以用作目标文件名
打开新的目标文件
执行另一个正则表达式匹配以查找所有包含关键字 abcd 的 C 样式注释 /* */。注意：这些注释可以是 1 行或多于 1 行，因此正则表达式会查找第一个 /*，然后是关键字 abcd，然后是任意数量的文本和 space，然后才会遇到结束符 */
将正则表达式匹配写入目标文件

#!/usr/bin/perl
use warnings;
use strict;

my $src = 'D:\Scripts\sample.c';
my $fileName;

# open source file for reading
open(SRC_FH,'<',$src) or die $!;

while(my $row = <SRC_FH>){
    if ($row =~ /([0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{2}|[0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{3})/){
        $fileName = ;
    }
}

my $des = "D:\Scripts\" . $fileName . ".txt";

# open destination file for writing
open(DES_FH,'>',$des) or die $!;

print("copying content from $src to $des\n");

seek SRC_FH, 0, 0;

while(my $row = <SRC_FH>){
    if ($row =~ /(\/\*.*abcd.[\s\S]*?\*\/)/){
        print DES_FH "\n";
    }
}

# always close the filehandles
close(SRC_FH);

close(DES_FH);
print "File content copied successfully!\n";

我的问题是，尽管正则表达式是正确的，但我认为由于 perl 代码的执行方式，我的目标文件只获得了写入的 1 行注释。任何超过 1 行的 C 风格注释都不会写入目标文件。我在第二个 if 语句中遗漏了什么？

我在此处检查了我的第二个 if 语句正则表达式 https://regexr.com/，它可以捕获多行 C 样式注释以及还包含关键字 abcd 的单行注释。

所以我尝试了 zdim 下面的第一个建议。这是我使用的：

#!/usr/bin/perl
use warnings;
use strict;

my $src = 'D:\Scripts\sample.c';
my $fileName;
my @comments;

# open source file for reading
open(SRC_FH,'<',$src) or die $!;

while(my $row = <SRC_FH>){
    if ($row =~ /([0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{2}|[0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{3})/){
        $fileName = ;
    }
}

my $des = "D:\Scripts\" . $fileName . ".txt";

# open destination file for writing
open(DES_FH,'>',$des) or die $!;

print("copying content from $src to $des\n");

#seek SRC_FH, 0, 0;

my $content = do {
    #read whole file at once
    local $/;
    open (SRC_FH,'<', $src) or die $!;
    <SRC_FH>;
};

#if($content =~ /(\/\*.*abcd.[\s\S]*?\*\/)/sg){
#       my @comments = $content;
#   }

my @comments = $content =~ /(\/\*.*abcd.[\s\S]*?\*\/)/sg;

foreach (@comments){
    print DES_FH "\n";
}

#while(my $row = <SRC_FH>){
#   if ($row =~ /(\/\*.*abcd.[\s\S]*?\*\/)/){
#       print DES_FH "\n";
#   }
#}

# always close the filehandles
close(SRC_FH);

close(DES_FH);
print "File content copied successfully!\n";

结果是 sample.c 中的所有内容都被复制到目标文件中。完整的 1:1 副本。我希望从 C 文件中提取所有单行和多行注释。

示例 1： /* A B C D */ 示例 2： /* 一些文本 * 更多评论 abcd 和更多评论 */

最终解决方案

#!/usr/bin/perl
use warnings;
use strict;

my $src = 'D:\Scripts\sample.c';
my $fileName;

# open source file for reading
open(SRC_FH,'<',$src) or die $!;

while(my $row = <SRC_FH>){
    if ($row =~ /([0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{2}|[0-9]{2}\.[0-9]{2}\.[0-9]{3}\.[a-z,0-9]{3})/){
        $fileName = ;
    }
}

my $des = "D:\Scripts\" . $fileName . ".txt";

# open destination file for writing
open(DES_FH,'>',$des) or die $!;

print("copying content from $src to $des\n");

seek SRC_FH, 0, 0;

my $content = do{local $/; <SRC_FH>};

my @comments = $content =~ /(\/\*.*abcd.[\s\S]*?\*\/)/g;

for(@comments){
    print DES_FH "$_\n";
}

# always close the filehandles
close(SRC_FH);

close(DES_FH);
print "File content copied successfully!\n";

Answer 1

What am I missing in my 2nd if statement?

嗯，没什么——只是在多行 C 注释中，它的两行都没有 /* 和 */。因此，当逐行读取文件时，正则表达式无法匹配多行注释。

要捕捉此类评论：

将整个文件读入一个字符串（“吞噬”它），然后在正则表达式上添加 /s 修饰符，以便 . 也匹配换行符。还可以使用 /g 修饰符 so 来捕获字符串中的所有此类模式。一种方式

my $content = do { 
    local $/;  # undef record separator so the whole file is read at once
    open my $src_fh, '<', $src_file or die $!;  # have to re-open
    <$src_fh>;                                  # reads it all
};  # lexical filehandle gets closed as we leave scope

# NOTE -- there may be difficulties in capturing comments in a C source file
my @comments = $content =~ /.../sg;  # your regex

或者使用一个库来 slurp 一个文件，比如

use Path::Tiny;
my $content = path($src_file)->slurp;

或者，

当您看到 /*、get/print 所有行时设置一个标志，直到您点击结束 */，然后取消设置标志。这是它的基本版本

my $inside_comment = 0;
while (<$src_fh>) {
    if (m{(/\*.*)}) {         #/ fix syntax hilite
        $inside_comment = 1;  # opening line for the comment 
        say $des_fh ; 
    } 
    elsif (m{(.*\*/)}) {      # closing line for the comment
        say $des_fh ; 
        $inside_comment = 0; 
    } 
    elsif ($inside_comment) { say $des_fh $_}
}

我测试了所有这些，但请检查并改进。其一，前导空格很有趣。

注意：一般来说，从 C 程序中获取所有注释可能相当棘手。

这是一个 one-line 版本的 slurping

my $file_content = do { local (@ARGV, $/) = $file_name; <> }

Perl：Regex 不抓取代码中的多行 C 风格注释

Perl: Regex not grabbing multiline C style comments in code

regex

perl