Word VBA：遍历字符非常慢

Question

我有一个宏可以将数字前面的单引号更改为撇号（或关闭单引号）。通常，当您在 word 中键入类似 "the '80s" 的内容时，“8”前面的撇号朝向错误的方向。下面的宏有效，但速度非常慢（每页 10 秒）。在常规语言（甚至是解释语言）中，这将是一个快速的过程。任何见解为什么在 Word 2007 上 VBA 需要这么长时间？或者如果有人有一些查找+替换的技巧可以在不迭代的情况下做到这一点，请告诉我。

Sub FixNumericalReverseQuotes()
    Dim char As Range
    Debug.Print "starting " + CStr(Now)
    With Selection
        total = .Characters.Count
        ' Will be looking ahead one character, so we need at least 2 in the selection
        If total < 2 Then
            Return
        End If
        For x = 1 To total - 1
            a_code = Asc(.Characters(x))
            b_code = Asc(.Characters(x + 1))

            ' We want to convert a single quote in front of a number to an apostrophe
            ' Trying to use all numerical comparisons to speed this up
            If (a_code = 145 Or a_code = 39) And b_code >= 48 And b_code <= 57 Then
                .Characters(x) = Chr(146)
            End If 
        Next x
    End With
    Debug.Print "ending " + CStr(Now)
End Sub

Answer 1

这是一个求正则表达式的问题。多次解析 .Characters 调用可能会导致性能下降。

我会这样做：

Public Sub FixNumericalReverseQuotesFast()

    Dim expression As RegExp
    Set expression = New RegExp

    Dim buffer As String
    buffer = Selection.Range.Text

    expression.Global = True
    expression.MultiLine = True
    expression.Pattern = "[" & Chr$(145) & Chr$(39) & "]\d"

    Dim matches As MatchCollection
    Set matches = expression.Execute(buffer)

    Dim found As Match
    For Each found In matches
        buffer = Replace(buffer, found, Chr$(146) & Right$(found, 1))
    Next

    Selection.Range.Text = buffer

End Sub

注意：需要引用 Microsoft VBScript Regular Expressions 5.5（或后期绑定）。

编辑： 不使用正则表达式库的解决方案仍然避免使用范围。这可以很容易地转换为使用字节数组：

Sub FixNumericalReverseQuotes()
    Dim chars() As Byte
    chars = StrConv(Selection.Text, vbFromUnicode)

    Dim pos As Long
    For pos = 0 To UBound(chars) - 1
        If (chars(pos) = 145 Or chars(pos) = 39) _
        And (chars(pos + 1) >= 48 And chars(pos + 1) <= 57) Then
           chars(pos) = 146
        End If
    Next pos

    Selection.Text = StrConv(chars, vbUnicode)
End Sub

基准（100 次迭代，3 页文本，每页 100 "hits"）：

正则表达式方法：1.4375 秒
数组法：2.765625秒
OP方法：（23分钟后结束任务）

速度大约是 Regex 的一半，但每页仍然大约 10 毫秒。

编辑 2: 显然上面的方法不是格式安全的，所以方法 3:

Sub FixNumericalReverseQuotesVThree()

    Dim full_text As Range
    Dim cached As Long

    Set full_text = ActiveDocument.Range
    full_text.Find.ClearFormatting
    full_text.Find.MatchWildcards = True
    cached = full_text.End

    Do While full_text.Find.Execute("[" & Chr$(145) & Chr$(39) & "][0-9]")
        full_text.End = full_text.Start + 2
        full_text.Characters(1) = Chr$(96)
        full_text.Start = full_text.Start + 1
        full_text.End = cached
    Loop

End Sub

同样，比上述两种方法都慢，但仍然运行得相当快（大约 ms）。

Answer 2

@Comintern 的修改版本 "Array method":

Sub FixNumericalReverseQuotes()
    Dim chars() As Byte
    chars = StrConv(Selection.Text, vbFromUnicode)

    Dim pos As Long
    For pos = 0 To UBound(chars) - 1
        If (chars(pos) = 145 Or chars(pos) = 39) _
        And (chars(pos + 1) >= 48 And chars(pos + 1) <= 57) Then
           ' Make the change directly in the selection so track changes is sensible.
           ' I have to use 213 instead of 146 for reasons I don't understand--
           ' probably has to do with encoding on Mac, but anyway, this shows the change.
           Selection.Characters(pos + 1) = Chr(213)
        End If
    Next pos
End Sub

Answer 3

也许是这个？

Sub FixNumQuotes()
    Dim MyArr As Variant, MyString As String, X As Long, Z As Long
    Debug.Print "starting " + CStr(Now)
    For Z = 145 To 146
        MyArr = Split(Selection.Text, Chr(Z))
        For X = LBound(MyArr) To UBound(MyArr)
            If IsNumeric(Left(MyArr(X), 1)) Then MyArr(X) = "'" & MyArr(X)
        Next
        MyString = Join(MyArr, Chr(Z))
        Selection.Text = MyString
    Next
    Selection.Text = Replace(Replace(Selection.Text, Chr(146) & "'", "'"), Chr(145) & "'", "'")
    Debug.Print "ending " + CStr(Now)
End Sub

我不是 100% 确定您的标准，我已经将左单引号和右单引号都设为 ' 但您可以根据需要轻松更改它。

它在 chr(145) 上将字符串拆分为一个数组，检查每个元素的第一个字符是否为数字，如果找到则用单引号作为前缀。

然后它将数组连接回 chr(145) 上的字符串，然后为 chr(146) 重复所有操作。最后，它会在字符串中查找是否出现了单引号和任何一个相邻的卷曲引号（因为那必须是我们刚刚创建的东西），并用我们想要的单引号替换它们。这会使任何不在数字旁边的事件保持不变。

这个最后的替换部分是您想要更改的位，而不是 ' 作为字符。

Answer 4

几天来我一直在为此苦苦挣扎。我尝试的解决方案是在 document.text 上使用正则表达式。然后，使用 document.range(start,end) 中的匹配项替换文本。这会保留格式。

问题是范围的开始和结束与文本中的索引不匹配。我想我已经发现了差异 - 隐藏在范围内的是域代码（在我的例子中它们是超链接）。另外，document.text有一堆BEL代码，很容易剥离出来。如果您使用字符方法遍历一个范围，将字符附加到字符串并打印它，您将看到使用 .text 方法时不会显示的字段代码。

令人惊讶的是，如果您以多种方式之一打开 "show field codes"，您可以获得 document.text 中的域代码。不幸的是，该版本与 range/characters 显示的不完全相同 - document.text 仅包含字段代码，range/characters 包含字段代码和字段值。因此，您永远无法获得匹配的字符索引。

我有一个工作版本，我不使用 range(start,end)，而是做类似的事情：

Set matchRange = doc.Range.Characters(myMatches(j).FirstIndex + 1)           
matchRange.Collapse (wdCollapseStart)
Call matchRange.MoveEnd(WdUnits.wdCharacter, myMatches(j).Length)
matchRange.text = Replacement

正如我所说，这有效，但第一个语句非常慢 - Word 似乎正在遍历所有字符以到达正确的点。这样做似乎没有计算域代码，所以我们到达了正确的点。

最重要的是，我还没有想出一个好的方法来将 document.text 字符串的索引匹配到一个等效的范围（开始，结束），这不是性能灾难。

欢迎提出想法，谢谢。

Answer 5

除了两个指定的（为什么...？和没有...怎么办？）还有一个隐含的问题 – 如何正确遍历 Word 对象集合。答案是——使用 obj.Next 属性而不是通过索引访问。也就是说，而不是：

For i = 1 to ActiveDocument.Characters.Count
    'Do something with ActiveDocument.Characters(i), e.g.:
    Debug.Pring ActiveDocument.Characters(i).Text
Next

应该使用：

Dim ch as Range: Set ch = ActiveDocument.Characters(1)
Do
    'Do something with ch, e.g.:
    Debug.Print ch.Text
    Set ch = ch.Next 'Note iterating
Loop Until ch is Nothing

计时：00:03:30 vs. 00:00:06，超过 3 分钟 vs. 6 秒。

发现于 Google，link 丢失，抱歉。经个人探索证实。

Word VBA：遍历字符非常慢

Word VBA: iterating through characters incredibly slow

vba

ms-word