VB.NET 2010:使用 Regex 匹配 Java 多行注释

VB.NET 2010: Matching Java multiline comments with Regex

我想从文件中删除 (Java/C/C++/..) 多行注释。为此,我写了一个正则表达式:

/\*[^\*]*(\*+[^\*/][^\*]*)*\*+/

此正则表达式适用于 Nodepad++ 和 Geany(搜索并全部替换为空)。正则表达式在 VB.NET.

中的行为不同

我正在使用:

Microsoft Visual Studio 2010 (Version 10.0.40219.1 SP1Rel)
Microsoft .NET Framework (4.7.02053 SP1Rel)

我运行 替换的文件并不复杂。我不需要处理任何可能开始或结束评论的引用文本。

@sln 感谢您的详细回复,我也会像您一样快速解释我的正则表达式!

/\*                      Find the beginning of the comment.
[^\*]*                   Match any chars, but not an asterisk.
                         We need to deal with finding an asterisk now:
(\*+[^\*/][^\*]*)*       This regex breaks down to:
 \*+                     Consume asterisk(s).
    [^\*/]               Match any other char that is not an asterisk or a / (would end the comment!).
          [^\*]*         Match any other chars that are not asterisks.
(               )*       Try to find more asterisks followed by other chars.

\*+/                     Match 1 to n asterisks and finish the comment with /.

这里有两个代码片段:

第一个:

text

/*
 * block comment
 *
 */ /* comment1 */ /* comment2 */

My text to keep.

/* more comments */

more text

第二个:

text

/*
 * block comment
 *
 */ /* comment1 *//* comment2 */

My text to keep.

/* more comments */

more text

唯一的区别是

之间的 space
/* comment1 *//* comment2 */

使用 Notepad++ 和 Geany 删除找到的匹配项对这两种情况都非常有效。对于第二个示例,使用来自 VB.NET 的正则表达式失败。删除后第二个示例的结果如下所示:

text



more text

但它应该是这样的:

text



My text to keep.



more text

我正在使用 System.Text.RegularExpressions:

Dim content As String = IO.File.ReadAllText(file_path_)
Dim multiline_comment_remover As Regex = New Regex("/\*[^\*]*(\*+[^\*/][^\*]*)*\*+/")
content = multiline_comment_remover.Replace(content, "")

我希望使用 VB.NET 获得与使用 Notepad++ 和 Geany 相同的结果。正如 sln 回答的那样,我的正则表达式 "should work in a weird way"。问题是为什么 VB.NET 无法按预期处理此正则表达式?这个问题仍然悬而未决。

由于 sln 的回答使我的代码正常工作,我将接受此回答。虽然这并不能解释为什么 VB.NET 不喜欢我的正则表达式。感谢你的帮助!我学到了很多!

我认为您可以使用通用的 C++ 注释剥离器。

基本上是
Glbolly 在下面找到,替换为 </code> </p> <p>演示 PCRE:<a href="https://regex101.com/r/UldYK5/1" rel="nofollow noreferrer">https://regex101.com/r/UldYK5/1</a><br> 演示 Python:<a href="https://regex101.com/r/avfSfB/1" rel="nofollow noreferrer">https://regex101.com/r/avfSfB/1</a></p> <pre><code> # raw: (?m)((?:(?:^[ \t]*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|/\*|//)))?|//(?:[^\]|\(?:\r?\n)?)*?(?:\r?\n(?=[ \t]*(?:\r?\n|/\*|//))|(?=\r?\n))))+)|("(?:\[\S\s]|[^"\])*"|'(?:\[\S\s]|[^'\])*'|(?:\r?\n|[\S\s])[^/"'\\s]*) # delimited: /(?m)((?:(?:^[ \t]*)?(?:\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/(?:[ \t]*\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/)))?|\/\/(?:[^\]|\(?:\r?\n)?)*?(?:\r?\n(?=[ \t]*(?:\r?\n|\/\*|\/\/))|(?=\r?\n))))+)|((?:"[^"\]*(?:\[\S\s][^"\]*)*"|'[^'\]*(?:\[\S\s][^'\]*)*'|(?:\r?\n(?:(?=(?:^[ \t]*)?(?:\/\*|\/\/))|[^\/"'\\r\n]*))+|[^\/"'\\r\n]+)+|[\S\s][^\/"'\\r\n]*)/ (?m) # Multi-line modifier ( # (1 start), Comments (?: (?: ^ [ \t]* )? # <- To preserve formatting (?: /\* # Start /* .. */ comment [^*]* \*+ (?: [^/*] [^*]* \*+ )* / # End /* .. */ comment (?: # <- To preserve formatting [ \t]* \r? \n (?= [ \t]* (?: \r? \n | /\* | // ) ) )? | // # Start // comment (?: # Possible line-continuation [^\] | \ (?: \r? \n )? )*? (?: # End // comment \r? \n (?= # <- To preserve formatting [ \t]* (?: \r? \n | /\* | // ) ) | (?= \r? \n ) ) ) )+ # Grab multiple comment blocks if need be ) # (1 end) | ## OR ( # (2 start), Non - comments # Quotes # ====================== (?: # Quote and Non-Comment blocks " [^"\]* # Double quoted text (?: \ [\S\s] [^"\]* )* " | # -------------- ' [^'\]* # Single quoted text (?: \ [\S\s] [^'\]* )* ' | # -------------- (?: # Qualified Linebreak's \r? \n (?: (?= # If comment ahead just stop (?: ^ [ \t]* )? (?: /\* | // ) ) | # or, [^/"'\\r\n]* # Chars which doesn't start a comment, string, escape, # or line continuation (escape + newline) ) )+ | # -------------- [^/"'\\r\n]+ # Chars which doesn't start a comment, string, escape, # or line continuation (escape + newline) )+ # Grab multiple instances | # or, # ====================== # Pass through [\S\s] # Any other char [^/"'\\r\n]* # Chars which doesn't start a comment, string, escape, # or line continuation (escape + newline) ) # (2 end), Non - comments


如果您使用不支持断言的特定引擎,
那你就得用这个了。
但这不会保留格式。

用法同上。

    # (/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\]|\\n?)*?\n)|("(?:\[\S\s]|[^"\])*"|'(?:\[\S\s]|[^'\])*'|[\S\s][^/"'\]*)


    (                                # (1 start), Comments 
         /\*                              # Start /* .. */ comment
         [^*]* \*+
         (?: [^/*] [^*]* \*+ )*
         /                                # End /* .. */ comment
      |  
         //                               # Start // comment
         (?: [^\] | \ \n? )*?           # Possible line-continuation
         \n                               # End // comment
    )                                # (1 end)
 |  
    (                                # (2 start), Non - comments 
         "
         (?: \ [\S\s] | [^"\] )*        # Double quoted text
         "
      |  '
         (?: \ [\S\s] | [^'\] )*        # Single quoted text
         ' 
      |  [\S\s]                           # Any other char
         [^/"'\]*                        # Chars which doesn't start a comment, string, escape,
                                          # or line continuation (escape + newline)
    )                                # (2 end)