VBA - 通过模式搜索查找 VBE 模块中所有编号的行

VBA - Find all numbered lines in VBE Modules via pattern search

任务:

我的目标是在我的代码模块的过程中找到所有 编号的行 CodeModule.Find 方法可用于检查搜索词(目标参数)。

语法:

object.Find(target, startline, startcol, endline, endcol [ wholeword] [ matchcase] [ patternsearch] )

参考帮助站点 https://msdn.microsoft.com/en-us/library/aa443952(v=vs.60).aspx 指出: 参数 patternsearch:可选。一个布尔值,指定目标字符串是否为正则表达式模式。 如果为 True,则目标字符串是正则表达式模式。 False 是默认值。

如上所述,find 方法允许进行正则表达式模式搜索,我想使用它来以精确的方式识别编号的行: 数字后跟一个制表符。因此,下面的示例定义了一个搜索字符串 s,并将 .Find 方法中的最后一个参数 PatternSearch 设置为 True。

问题 据我所知,一个有效的正则表达式定义可以是

s = "[0-9]{1,4}[ \t]"

但这没有显示任何内容,甚至没有错误。

为了至少显示任何结果,我定义了搜索词

s = "[0-9]*[ \t]*)"

在调用示例过程中 ListNumberedLines 显示不稳定的结果。

问题

是否有可能在 CodeModule.Find 方法中使用有效的正则表达式模式搜索?

示例代码

Option Explicit

' ==============
' Example Search
' ==============
Sub ListNumberedLines()
'  Declare search pattern string s
   Dim S As String
10  S = "[0-9]*[ \t]*)"     
20  Debug.Print "Search Term: " & S
30  Call findWordInModules(S)

End Sub

Public Sub findWordInModules(ByVal sSearchTerm As String)
' Purpose: find modules ('components') with lines containing a search term
' Method:  .CodeModule.Find with last parameter patternsearch set to True
' Based on https://www.devhut.net/2016/02/24/vba-find-term-in-vba-modulescode/

' VBComponent requires reference to Microsoft Visual Basic for Applications Extensibility
'             or keep it as is and use Late Binding instead
' Declare module variable oComponent
  Dim oComponent            As Object    'VBComponent

  For Each oComponent In Application.VBE.ActiveVBProject.VBComponents
    If oComponent.CodeModule.Find(sSearchTerm, 1, 1, -1, -1, False, False, True) = True Then
        Debug.Print "Module: " & oComponent.Name  'Name of the current module in which the term was found (at least once)
        'Need to execute a recursive listing of where it is found in the module since it could be found more than once
        Call listLinesinModuleWhereFound(oComponent, sSearchTerm)
    End If
  Next oComponent
End Sub

Sub listLinesinModuleWhereFound(ByVal oComponent As Object, ByVal sSearchTerm As String)
' Purpose: list module lines containing a search term
' Method:  .CodeModule.Find with last parameter patternsearch set to True
  Dim lTotalNoLines         As Long   'total number of lines within the module being examined
  Dim lLineNo               As Long   'will return the line no where the term is found
    lLineNo = 1
  With oComponent         ' Module
    lTotalNoLines = .CodeModule.CountOfLines
    Do While .CodeModule.Find(sSearchTerm, lLineNo, 1, -1, -1, False, False, True) = True
        Debug.Print vbTab & "Zl. " & lLineNo & "|" & _
                    Trim(.CodeModule.Lines(lLineNo, 1))  'Remove any padding spaces
        lLineNo = lLineNo + 1    'Restart the search at the next line looking for the next occurence
    Loop
  End With
End Sub

很长一段时间,Rubberduck 一直在努力 properly/formally 解析行号 - 我们的解决方法是在将代码模块内容提供给我们的解析器之前删除它们(用空格替换它们) .

最近我们成功地正式定义了行号:

// lineNumberLabel should actually be "statement-label" according to MS VBAL but they only allow lineNumberLabels:
// A <statement-label> that occurs as the first element of a <list-or-label> element has the effect 
// as if the <statement-label> was replaced with a <goto-statement> containing the same 
// <statement-label>. This <goto-statement> takes the place of <line-number-label> in 
// <statement-list>.  
listOrLabel :
    lineNumberLabel (whiteSpace? COLON whiteSpace? sameLineStatement?)*
    | (COLON whiteSpace?)? sameLineStatement (whiteSpace? COLON whiteSpace? sameLineStatement?)*
;
sameLineStatement : blockStmt;

lineNumberLabel定义为:

//Statement labels can only appear at the start of a line.
statementLabelDefinition : {_input.La(-1) == NEWLINE}? (combinedLabels | identifierStatementLabel | standaloneLineNumberLabel);
identifierStatementLabel : unrestrictedIdentifier whiteSpace? COLON; 

standaloneLineNumberLabel : 
    lineNumberLabel whiteSpace? COLON
    | lineNumberLabel;
combinedLabels : lineNumberLabel whiteSpace identifierStatementLabel;
lineNumberLabel : numberLiteral;

(完整的 Antlr4 语法 here

注意谓词 {_input.La(-1) == NEWLINE}?,它强制解析器规则仅匹配行首的 statementLabelDefinition - 逻辑 代码行.

您看到 VBA 代码有 物理 代码行,就像您从 CodeModule 的内容中得到的一样。但是 VBA 代码也有一个 逻辑 代码行的概念,结果是 是解析器所关心的。

这会触发任何 典型 正则表达式:

Sub DoSomething()
    Debug.Print _
42
End Sub

签名和 End Sub 令牌之间只有 1 行逻辑代码,但是一个简单的 Find 会很乐意将 42 视为 "line number" .. .它不是 - 它是传递给 Debug.Print 的参数,在同一指令中,在同一逻辑代码行上,但在下一个物理代码行上。

如果不首先预处理您的输入,您就无法处理 逻辑 代码行,以考虑行继续标记。

为了做到这一点,您需要实际解析您看到的指令——至少知道它们从哪里开始,从哪里结束。 .. 这不是一件小事! 请参阅 ThunderFrame 的回答

VBIDE API 非常有限,对此没有帮助。


TL;DR:您无法单独使用正则表达式解析 VBA 代码。所以,不。抱歉! 您需要比这复杂得多的正则表达式模式 - 请参阅 ThunderFrame 的回答

正如@MatsMug 所说,用 Regex 解析 VBA 是 hard 不可能的,但是行号是一个更简单的情况,应该可以单独使用 regex 找到。

幸运的是,行号只能出现在过程主体中(包括 End Sub/Function/Property 语句之前),因此我们知道它们永远不会出现在代码的第一行。

不幸的是,您可以在行标签前添加 0 个或多个行延续:

Sub Foo()
 _
 _
10 Beep
End Sub

此外,行号后面并不总是跟着 space - 它后面可以跟指令分隔符,使行号看起来像行标签:

Sub foo()
10: Beep
End Sub

And 如果你的代码是邪恶的,你可能会遇到一个负的行号(使用十六进制符号输入 - VBE 尽职尽责地打印回代码面板前导 space 和负数):

Sub foo()
10 Beep
 -1 Beep
End Sub

而且我们还需要能够识别出现在连续行上的数字,不是行号:

Sub foo()
  Debug.Print _
5 & "is not a line-number"
End Sub

所以,这里有一些邪恶的行编号,混合了所有这些边缘情况:

Option Explicit

Sub foo()

5: Beep

 _
 _
 _
10 Beep

20 _
'Debug.Print _
30

50: Beep

40 Beep

 _
 -1 _
 Beep 'The "-1" line number is achieved by entering "&HFFFFFFFF"

Debug.Print _
2 & "is not a line-number"

60 End Sub

下面是一些标识行号的正则表达式:

(?<! _)\n( _\n)* ?(?<line_number>(?:\-)?\d+)[: ]

这是来自 regex101 的语法高亮:

关于CodeModule.Find的结论通过搜索模式

首先,CodeModule.Find 对搜索模式没有帮助,它的可能用途是不透明的。 我同意 VBIDE API 非常有限,并且存在我强烈推荐给任何程序员的优秀专业工具:-)

结果:通过XML

解决

其次,如果可能的话,我更喜欢家庭疗法,所以我试图找到一个仅使用 VBIDE 有用部分的替代解决方案。

方法 这就是为什么我尝试了 CodeModule.Lines 的简单 xml 对话,允许在逻辑行内进行灵活的搜索。 我没有在请求 xml 数据时使用正则表达式,而是演示了一种通过定义明确的 XPath 搜索(通过节点列表循环)查找前导数字的方法, 从而解决了@ThunderFrame 显示的大多数问题。函数 showErls 中的搜索字符串定义为 "line[substring(translate(.,'0123456789','¹¹¹¹¹¹¹¹¹¹'),1,1)="¹"]"

进一步的功能 'lineNumber' returns 模块内的逻辑行号。 注意:为简单起见,搜索仅限于一个模块(用户定义常量 MYMODULE)并且代码避免使用任何正则表达式。

变通代码 - 主子

Option Explicit
' ==========================================
' User defined name of module to be analyzed
' ==========================================
  Const MYMODULE = "modThunderFrame"    ' << change to existing module name or userform
' Declare xml file as object
  Dim xCMods As Object            ' Late Binding; instead of Early Bd: Dim xCMods As MSXML2.DOMDocument6
  
Public Sub TestLineNumbers()
' =================
' A. Load/refresh code into xml
' =================
' set xml into memory - contains code module(s) lines
  Set xCMods = CreateObject("MSXML2.Domdocument.6.0") ' L.Bd.; instead of E.Bd: Set xCMods = New MSXML2.DOMDocument60
      xCMods.async = False
      xCMods.validateOnParse = False
' read in user defined code module and load xml, if failed show error message
  refreshCM MYMODULE
  If xCMods Is Nothing Then Exit Sub

' ======================
' B. search line numbers
' ======================
  showERLs
  
' =============================
' C. Save xml if needed
' =============================
  ' xCMods.Save ThisWorkbook.Path & "\VBE(" & MYMODULE & ").xml"
  ' MsgBox "Successfully exported Excel data to " & ThisWorkbook.Path & "\VBE(" & MYMODULE & ").XML!", _
  '        vbInformation, "Module " & MYMODULE & " to xml"
 
' =================
' D. terminate xml
' =================
  Set xCMods = Nothing

End Sub

子程序

Private Sub showERLs()
' Purpose: [B.] declare XPath search string and define special translate character
  Dim s  As String
  Dim S1 As String: S1 = Chr(185)   ' superior number 1 (hex B9) replaces any digit
' declare node and node list
  Dim line    As Object
  Dim lines   As Object
' define XPath search string for first digit in line (usual case)
  s = "line[substring(translate(.,'0123456789','" & String(10, S1) & "'),1,1)=""" & _
                  S1 & _
                  """]"
' start debugging
  Debug.Print "**search string=""" & s & """" & vbNewLine & String(50, "-")
  Debug.Print "Line #|Line Content" & vbNewLine & String(50, "-"); ""
' set node list
  Set lines = xCMods.DocumentElement.SelectNodes(s)
' -------------------
' loop thru node list
' -------------------
  For Each line In lines
      Debug.Print Format(lineNumber(line), "00000") & "|" & line.Text      ' return logical line number plus line content
  Next line

End Sub

Private Sub refreshCM(sModName As String)
' Purpose: [A.] load xml string via LoadXML method
  Dim sErrTxt As String
  Dim line    As Object
  Dim lines   As Object
  Dim xpe     As Object
  Dim s       As String  ' xpath expression
  Dim pos     As Integer ' position of line number prefix
  ' ======================================
  ' 1. Read code module lines and load xml
  ' ======================================
    If Not xCMods.LoadXML(readCM(sModName)) Then
    ' set ParseError object
      Set xpe = xCMods.parseError
      With xpe
        sErrTxt = sErrTxt & vbNewLine & String(20, "-") & vbNewLine & _
          "Loading Error No " & .ErrorCode & " of xml file " & vbCrLf & _
          Replace(" " & Replace(.URL, "file:///", "") & " ", "  ", "[No file found]") & vbCrLf & vbCrLf & _
          xpe.reason & vbCrLf & _
          "Source Text:    " & .srcText & vbCrLf & _
          "char?:  " & """" & Mid(.srcText, .linepos, 1) & """" & vbCrLf & vbCrLf & _
          "Line no:    " & .line & vbCrLf & _
          "Line pos: " & .linepos & vbCrLf & _
          "File pos.:  " & .filepos & vbCrLf & vbCrLf
      End With
      MsgBox sErrTxt, vbExclamation, "XML Loading Error"
      Set xCMods = Nothing
      Exit Sub
    End If
    
' 2. resolve hex input problem of negative line numbers with leading space (thx @Thunderframe)
    s = "line"
    Set lines = xCMods.DocumentElement.SelectNodes(s)
  ' loop thru all logical lines
    For Each line In lines
        pos = ErlPosInLine(line.Text)
        If pos <= Len(line.Text) Then
           ' to do: add attribute to line node, if wanted
           
           ' correct line content
             line.Text = Mid(line.Text, pos)
        End If
    Next
End Sub

Private Function lineNumber(node As Object) As Long
' Purpose: [B.] return logical line number within code module lines
' Param.:  IXMLDomNode
' Method:  XPath via preceding-sibling count plus one
Dim tag As String: tag = "line"
lineNumber = node.SelectNodes("preceding-sibling::" & tag).Length + 1

End Function


Private Function readCM(Optional modName = "*") As String
' Purpose: return code module line string (VBIDE) of a user defined module to be read into xml
' Call:    called from [A.] refreshCM
'          xCMods.LoadXML(readCM(sModName))
' Declare variable
  Dim s     As String
  Dim md As CodeModule
  If modName = "*" Then Exit Function
  On Error GoTo OOPS
' get code module lines into string
  Set md = Application.VBE.ActiveVBProject.VBComponents(modName).CodeModule   ' MSAccess: Modules("modVBELines")
' change to xml tags
  s = getTags(md.lines(1, md.CountOfLines))
' return
  readCM = s
OOPS:
End Function

Private Function getTags(ByVal s As String, Optional mode = False) As String
' Purpose: prepares xml string to be loaded
' define constant
  Const HEAD = "<?xml version=""1.0"" encoding=""utf-8""?>" & vbCrLf & "<cm>" & vbCrLf
' 1. change tag characters
  s = Replace(Replace(s, "<", "&lt;"), ">", "&gt;")
' 2. change special characters (ampersand)
  s = Replace(s, "&", "&amp;")
' 3. change "_" points
  s = Replace(s, "_" & vbCrLf, Chr(133) & vbLf)
' 4. define logical line entities
  If Right(s, 2) = vbCrLf Then s = Left(s, Len(s) - 2)
  s = HEAD & "  <line>" & Replace(s, vbCrLf, "</line>" & vbCrLf & "  <line>") & "</line>" & vbCrLf & "</cm>"
  
' debug xml tags if second function parameter is true (mode = True)
  If mode Then Debug.Print s

' return
  getTags = s
End Function

Sub testErlPosInLine()
' Purpose: Test Thunderframe's problem with ERL prefixes (underscores, " ",..) and hex inputs
Dim s As String
s = " _" & vbLf & " -1 xx"
MsgBox "|" & Mid(s, ErlPosInLine(s)) & "|" & vbNewLine & _
       "prefix = |" & Mid(s, 1, ErlPosInLine(s) - 1) & "|"

End Sub
Private Function ErlPosInLine(ByVal s As String) As Integer
' Purpose: remove prefix (underscore, tab, " ",.. ) from numbered line
' cf:      
  Dim i As Long
  For i = 1 To Len(s)                 ' loop each char
    Select Case Mid$(s, i, 1)       ' examine current char
        Case " "                    ' permitted chars
        Case "_"
        Case vbLf, Chr(133), Chr(34)
        Case "0" To "9": Exit For   ' cut off point
        Case Else: Exit For         ' i is the cut off point
    End Select
  Next
  If Mid$(s, i, 1) = "-" And Len(s) > 1 Then
   If IsNumeric(Mid$(s, i + 1, 1)) Then i = i + 1
  End If
' return
ErlPosInLine = i
' debug.print Mid$(s, i) '//strip lead
End Function