删除注释时无法维护代码结构

Unable to maintain code structure when removing comments

我正在尝试替换所有类型的评论(单行、内联和多行)。当 // & /* */ 没有出现在任何类型的引号 """""""" 之间时,初始正则表达式工作得非常好。当我稍微修改正则表达式以处理和排除引号之间 // 的出现时,它失败并弄乱了初始代码结构。

这是我最初的正则表达式 (Regex:1):(?:/\*(?:[^*]|(?:\*+[^*/]))*\*+/)|(?://.*)

这是我调整的正则表达式,试图处理引号内的单行注释 (Regex:2):(?:/\*(?:[^*]|(?:\*+[^*/]))*\*+/)|[^\"](?://.*)[^\"]

考虑这个示例数据:

// Comment 1
/* Multiline comments
ends here */  Some text
Random statement // something else
import something..
import something else /* few random stuff
that goes on */ /* Lets try this again */
Text to show
val tryThis = "  something // else "
val tryAgain = "12345" 
val again = " /* kskokds // */ "

Regex:1 =>

的实际结果
  Some text
Random statement 
import something..
import something else  
Text to show
val tryThis = "  something 
val tryAgain = "12345" 
val again = "  "

Regex:2 =>

的实际结果
// Comment 1
  Some text
Random statementimport something..
import something else  
Text to show
val tryThis = "  somethingval tryAgain = "12345" 
val again = "  "

预期结果=>

  Some text
Random statement 
import something..
import something else  
Text to show
val tryThis = "  something // else "
val tryAgain = "12345" 
val again = " /* kskokds // */ "

我是第一个 post 回答 link 这个著名问题的人: RegEx match open tags except XHTML self-contained tags

认真的回答是

I think the flaw here is that HTML is a Chomsky Type 2 grammar (context free grammar) and RegEx is a Chomsky Type 3 grammar (regular grammar). Since a Type 2 grammar is fundamentally more complex than a Type 3 grammar (see the Chomsky hierarchy), it is mathematically impossible to parse XML with RegEx.

Java 注释的标准也不是 context-free 语法。所以关于解析 html 所说的一切都适用于此。