查找在前面的注释中也提到了名称的 C 函数定义

Question

我有一个C文件。使用 C 风格的注释集 /* */ 后跟为每个注释定义的变量。变量名也在注释中。一些注释包含它们不适用的变量名称（参见下面示例中的第 3 条注释）

以下是格式示例：

/* Object: function1: Does some really cool things and then it ends */
const function1 = someValue;

/* Object: function2: Does more really cool things and then it ends */
const function2 = someValue2;

/* Object: function3: Does even more really cool things
just like function2, does but continues over to the next line for a multiline comment */
const function3 = someValue3;

/* Object: function4: Does all kinds of cool things
and needs function1 in order to set a value correctly */
const function4 = someValue4;

/* Object: function5: Does some other cool things
and needs function2[with another variable] to do some things */
const function5 = someBValue5;

我只想将变量名与这样的结果相匹配：功能1 功能2 函数3 功能4 函数 5

我已经在 https://regexr.com/ 上玩这个几个小时了，但我无法得到这个。

这是我尝试过的：有了这个 post 它使用了负面的回顾。我不能使用负后视，因为此正则表达式正在 Windows 10 机器上的 Perl 5.32.1 中使用。

这是我能想到的最好的：

(\bfunction[\w]+\b[^:,])

它排除了与 : 或匹配的行，但不排除包含在 /* */ 中的重复项。但是除了使用我无法使用的负面回顾之外，我一直无法弄清楚。

最终，我认为最好的解决方案是排除 /* */ 之间的所有内容，只搜索评论中未包含的内容。但它需要支持排除多行评论内容并且不能使用负面回顾。

测试第 1 轮

这不是我的问题的完整答案，因为它没有省略函数名前面的 const & space。 function1、function 2 等只是通用函数名称。它们将是字母数字，所以相信 function[\w]+ 仍然提供函数名称的最佳捕获。

Answer 1

尝试使用 const 作为过滤器：

"const (\bfunction[\w]+\b[^:,])"

它不允许其他邻居，为您提供函数名称的唯一值。

为了获取您的组，您需要引用，您将只获取函数名称。

Answer 2

我对这个问题的看法：从它的定义中找到一个函数名称（后跟 =）在评论之外，但在提到它的评论之后（后跟 :）。

这是一个简单的 step-by-step state-full 方法：检测我们是否在评论中，以及我们是否找到 /(function[0-9]+):/，并设置合适的标志；然后在评论和更新标志后寻找相同的功能。

use warnings;
use strict;
use feature 'say';

my $file = shift // die "Usage: [=10=] filename\n";

open my $fh, '<', $file or die $!; 

my (@func_names, $inside_comment, $func_name);
while (<$fh>) { 
    chomp;
    # Are we inside a comment? Look for function[0-9]+: 
    if (m{/\*}) {                           #/ fix syntax hilite
        $inside_comment = 1 if not m{\*/};  #/ starts multiline comment?
        if (/(function[0-9]+):/) { 
            $func_name = ; 
        }
    }   
    elsif (m{\*/}) {         #/ closing line for multiline comment
        $inside_comment = 0;  
        if (not $func_name and /(function[0-9]+):/) {   #/
            $func_name = ; 
        }
    }   
    elsif ($inside_comment and not $func_name) { 
        if (/(function[0-9]+):/) { 
            $func_name = ; 
        }
    }   
    # Check for name outside (after) a comment where it was found
    elsif (not $inside_comment and $func_name) { 
        if (/(function[0-9]+)\s+=/) { 
            say "Found our definition: ";
            push @func_names, ;
            $func_name = ''; 
        }
    }   
}
say for @func_names;

使用提供的样本可以按预期打印。缺点：每行都用正则表达式测试两次。对于像源代码这样的小文件，人们永远不会注意到，但这并不好。可能有(edge?)情况没有涵盖，^†请测试改进。

另一种选择。将整个文件读入一个字符串，并通过注释单步执行，在每个字符串后检查函数名；或者，解析它以获得评论 + function-definition。两者都使用 \G + /g.

use warnings;
use strict;
use feature 'say';

die "Usage [=11=] filename\n" if not @ARGV;
my $cont = do { local $/; <> };

# Pattern inside a C-style comment, possibly multiline (NOTE: not general)
my $re_cc = qr{/\* .*? (function[0-9]+): .*? \*/}sx;

my @func_names;

while ($cont =~ /$re_cc\s*/gc) { 
    my $func_name = ;
    if ( $cont =~ /\G .*? (function[0-9]+)\s*=/x and $func_name eq  ) {
        push @func_names, ;
    }
}

了解锚点 \G 及其与修饰符 /g in perlop. Some other resources are this page and this post 的结合使用（还有更多内容）。

这做了一些假设，也许稍微安全一点的版本是

use warnings;
use strict;
use feature 'say';

die "Usage [=12=] filename\n" if not @ARGV;
my $cont = do { local $/; <> };

# Pattern inside a C-style comment, possibly multiline (NOTE: not general)
my $re_cc = qr{/\* .*? (function[0-9]+): .*? \*/}sx;

my (@func_names, $func_name);    
while (1) {
    if ($cont =~ /\G $re_cc \s*/gcx) { 
        $func_name = ;
    }
    elsif ($cont =~ /\G (function[0-9]+)\s* = .*?\n\s*/gcx 
            and $func_name eq ) {
        #say "Found function definition for:  (at pos=", pos $cont, ")";
        push @func_names, ;
        $func_name = '';
    }
    elsif ($cont =~ /\G \S+ \s*/gcx) { }       # other, skip
    else                             { last }

}

say for @func_names;

这些都可以正确处理提供的文件，但对于更一般的情况肯定可以改进。^‡

请记住，一般和正确识别 C-style 评论可能非常棘手。参见 this perldoc FAQ

^† 一：如果评论不是后跟定义我们的标志可能会保持错误状态

^‡ 虽然关于改进的说明是关于其一般操作，但这里是相关的正则表达式注释。

$re_cc 模式使用 /s 修饰符，以便 . 也匹配换行符，因为它必须是 .* 才能跨多行匹配。但是，这种修饰符是全局设置的，它适用于使用此模式的正则表达式的 rest！嗯，这可能不是故意的。

在这种情况下，我看不出这有什么关系，但万一有办法设置 embedded (pattern-match) modifier，它仅适用于模式

/(?s)pattern(?-s)/

或者，如果模式在其组内自然起作用，则修改器将被丢弃在它之外，因此我们不需要使用 (?-s)

取消它

/((?s)pattern)/

查找在前面的注释中也提到了名称的 C 函数定义

Find C function definitions where the name is also mentioned in the preceding comment

regex

perl