使用正则表达式匹配 free() 和 malloc() 调用

Question

我正在创建一个 powershell 脚本来解析包含 C 代码的文件并检测它是否包含对 free()、malloc()[= 的调用36=] 或 realloc() 函数。

file_one.c

int MethodOne() { return 1; } int MethodTwo() { free(); return 1; }

file_two.c

int MethodOne() { //free(); return 1; } int MethodTwo() { free(); return 1; }

检查。ps1

$regex = "(^[^/]*free\()|(^[^/]*malloc\()|(^[^/]*realloc\()" $file_one= "Z:\PATH\file_one.txt" $file_two= "Z:\PATH\file_two.txt" $contentOne = Get-Content $file_one -Raw $contentOne -match $regex $contentTwo = Get-Content $file_two-Raw $contentTwo -match $regex

一次处理整个文件似乎与 contentOne 配合得很好，事实上我得到 True （因为 MethodTwo 中的 free()）。处理 contentTwo 就没那么幸运了 returns False 而不是 True （因为 MethodTwo 中的 free()）。
有人可以帮我写一个在这两种情况下都适用的更好的正则表达式吗？

Answer 1

好的，就是这个

原始：

^(?>(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\]|\(?:\r?\n)?)*?(?:\r?\n))|(?:"[^"\]*(?:\[\S\s][^"\]*)*"|'[^'\]*(?:\[\S\s][^'\]*)*'|(?!\b(?:free|malloc|realloc)\()[\S\s](?:(?!\b(?:free|malloc|realloc)\()[^/"'\])*))*(?:(\bfree\()|(\bmalloc\()|(\brealloc\())

弦乐：

"^(?>(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n))|(?:\"[^\"\\]*(?:\\[\S\s][^\"\\]*)*\"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?!\b(?:free|malloc|realloc)\()[\S\s](?:(?!\b(?:free|malloc|realloc)\()[^/\"'\\])*))*(?:(\bfree\()|(\bmalloc\()|(\brealloc\())"

逐字记录：

@"^(?>(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\]|\(?:\r?\n)?)*?(?:\r?\n))|(?:""[^""\]*(?:\[\S\s][^""\]*)*""|'[^'\]*(?:\[\S\s][^'\]*)*'|(?!\b(?:free|malloc|realloc)\()[\S\s](?:(?!\b(?:free|malloc|realloc)\()[^/""'\])*))*(?:(\bfree\()|(\bmalloc\()|(\brealloc\())"

已解释

 ^ 
 (?>
      (?:                              # Comments 
           /\*                              # Start /* .. */ comment
           [^*]* \*+
           (?: [^/*] [^*]* \*+ )*
           /                                # End /* .. */ comment
        |  
           //                               # Start // comment
           (?:                              # Possible line-continuation
                [^\] 
             |  \ 
                (?: \r? \n )?
           )*?
           (?: \r? \n )                     # End // comment
      )
   |                                 # OR,

      (?:                              # Non - comments 
           "
           [^"\]*                          # Double quoted text
           (?: \ [\S\s] [^"\]* )*
           "
        |  '
           [^'\]*                          # Single quoted text
           (?: \ [\S\s] [^'\]* )*
           ' 
        |                                 # OR,

           (?!                              # ASSERT: Here, cannot be free / malloc / realloc {}
                \b 
                (?: free | malloc | realloc )
                \(
           )
           [\S\s]                           # Any char which could start a comment, string, etc..
                                            # (Technically, we're going past a C++ source code error)

           (?:                              # -------------------------
                (?!                              # ASSERT: Here, cannot be free / malloc / realloc {}
                     \b 
                     (?: free | malloc | realloc )
                     \(
                )

                [^/"'\]                         # Char which doesn't start a comment, string, escape,
                                                 # or line continuation (escape + newline)
           )*                               # -------------------------
      )                                # Done Non - comments 
 )*

 (?:
      ( \b free\( )                    # (1), Free()
   |  
      ( \b malloc\( )                  # (2), Malloc()
   |  
      ( \b realloc\( )                 # (3), Realloc()
 )

一些注意事项：

这只会使用 ^ 锚从字符串开头找到第一个。
要全部找到它们，只需从正则表达式中删除 ^。

之所以有效，是因为它可以匹配您要查找的所有内容。
在本例中，它找到的是捕获组 1、2 或 3。

祝你好运!!

正则表达式包含的内容：

----------------------------------
 * Format Metrics
----------------------------------
Atomic Groups       =   1

Cluster Groups      =   10

Capture Groups      =   3

Assertions          =   2
       ( ? !        =   2

Free Comments       =   25
Character Classes   =   12

编辑
根据请求，解释处理
的正则表达式部分 /**/ 评论。这个 -> /\*[^*]*\*+(?:[^/*][^*]*\*+)*/

这是一个经过修改的展开循环正则表达式，它假定一个开始定界符
/* 和 */.
的结尾请注意 open/close 在其定界符
中共享一个公共字符 / 顺序。
为了能够在没有环视断言的情况下做到这一点，使用了一种方法
在循环内移动尾随定界符的星号。
使用此因式分解，所需要做的就是检查收盘价 /
完成定界序列。

 /\*              # Opening delimiter /*

 [^*]*            # Optionally, consume all non-asterisks

 \*+              # This must be 1 or more asterisks anchor's or FAIL.
                  # This is matched here to align the optional loop below
                  # because it is looking for the closing /.

 (?:              # The optional loop part
      [^/*]            # Specifically a single non / character (nor asterisk).
                       # Since a / will be the next closing delimiter, it must be excluded.

      [^*]*            # Optional non-asterisks.
                       # This will accept a / because it is supposed to consume ALL
                       # opening delimiter's as it goes
                       # and will consider the very next */ as a close.

      \*+              # This must be 1 or more asterisks anchor's or FAIL.
 )*               # Repeat 0 to many times.

 /                # Closing delimiter /

使用正则表达式匹配 free() 和 malloc() 调用

Matching free() and malloc() calls with regular expressions

regex

powershell

powershell-5.0

file_one.c

file_two.c

检查。ps1