VBA 将唯一的正则表达式附加到字符串变量

VBA Append unique regular expressions to string variable

如何从字符串中获取匹配的正则表达式,删除重复项,并将它们附加到以逗号分隔的字符串变量?

例如,在字符串“这是所需正则表达式的示例:BPOI-G8J7R9、BPOI-G8J7R9 和 BPOI-E5Q8D2”中,所需的输出字符串将为“BPOI-G8J7R9,BPOI-E5Q8D2”

我试图使用字典删除重复项,但我的函数吐出了可怕的#Value 错误。

谁能看出我哪里出错了?或者有什么更好的方法来完成这项任务的建议吗?

代码如下:

Public Function extractexpressions(ByVal text As String) As String
Dim regex, expressions, expressions_dict As Object, result As String, found_expressions As Variant, i As Long

Set regex = CreateObject("VBScript.RegExp")
regex.Pattern = "[A-Z][A-Z][A-Z][A-Z][-]\w\w\w\w\w\w"
regex.Global = True

Set expressions_dict = CreateObject("Scripting.Dictionary")

If regex.Test(text) Then
    expressions = regex.Execute(text)
End If

For Each item In expressions
    If Not expressions_dict.exists(item) Then expressions_dict.Add item, 1
Next

found_expressions = expressions_dict.items

result = ""

For i = 1 To expressions_dict.Count - 1
    result = result & found_expressions(i) & ","
Next i

extractexpressions = result

End Function

如果您从 Sub 调用您的函数,您将能够对其进行调试。

请参阅下面关于将匹配作为键添加到字典中的评论 - 如果您添加匹配对象本身,而不是显式指定匹配的 value 属性,您的字典将不会 de-duplicate 你的比赛(因为两个或更多 match 具有相同 value 的对象仍然是不同的对象)。

Sub Tester()
    Debug.Print extractexpressions("ABCD-999999 and DFRG-123456 also ABCD-999999 blah")
End Sub


Public Function extractexpressions(ByVal text As String) As String
    Dim regex As Object, expressions As Object, expressions_dict As Object
    Dim item
    
    Set regex = CreateObject("VBScript.RegExp")
    regex.Pattern = "[A-Z]{4}-\w{6}"
    regex.Global = True
    
    If regex.Test(text) Then
        Set expressions = regex.Execute(text)
        Set expressions_dict = CreateObject("Scripting.Dictionary")
        For Each item In expressions
            'A dictionary can have object-type keys, so make sure to add the match *value*
            '  and the not match object itself
            If Not expressions_dict.Exists(item.Value) Then expressions_dict.Add item.Value, 1
        Next
        extractexpressions = Join(expressions_dict.Keys, ",")
    End If
End Function

VBA 的正则表达式对象实际上支持对先前捕获组的反向引用。因此我们可以通过表达式本身获得所有唯一项:

([A-Z]{4}-\w{6})(?!.*)

在线查看demo


要付诸实践:

Sub Test()
    Debug.Print extractexpressions("this is an example of the desired regular expressions: BPOI-G8J7R9, BPOI-G8J7R9 and BPOI-E5Q8D2")
End Sub

Public Function extractexpressions(ByVal text As String) As String
    
With CreateObject("VBScript.RegExp")
    .Pattern = "([A-Z]{4}-\w{6})(?!.*)|."
    .Global = True
    extractexpressions = Replace(Application.Trim(.Replace(text, " ")), " ", ",")
End With
    
End Function

打印: