如何替换制表符分隔文件 (txt) 中后跟逗号 (,) 或点 (.) 的文本?

How to replace text followed by a comma (,) or dot (.) in tab delimited file (txt)?

我是自动热键的新手。我有一个脚本可以帮助我缩短那些我不需要的词,我在尝试替换后跟逗号或点的文本时遇到问题,这是我的脚本:

#NoEnv
#SingleInstance force
SetWorkingDir, %A_ScriptDir%
SendMode, Input
; -- Ctrl + SPACE -> Select all text + replace whole words only + title case
^SPACE::
NonCapitalized := "a|an|in|is|of|the|this|with" ; List of words that         shouldn't be capitalized, separated by pipes
ReplacementsFile := "replacements.txt" ; Path to replacements file (tab     delimited file with 2 columns, UTF-8-BOM, CR+LF)

Send, ^a ; Selects all text
Gosub, SelectToClip ; Copies the selected text to the clipboard
FileRead, Replacements, % ReplacementsFile ; Reads the replacements file
If ErrorLevel ; Error message if file is not found
{
MsgBox, % "File not found: " ReplacementsFile
Return
}

StringUpper, Clipboard, Clipboard, T ; Whole clipboard to title case
Clipboard := RegExReplace(Clipboard, "i)(?<![!?.]) \b(" NonCapitalized ")\b",     " $L1") ; Changes to lowercase all words from the list "NonCapitalized", except     those preceded by new line/period/exclamation mark/question mark
pos := 0
While pos := RegExMatch(Replacements, "m`a)^([^\t]+)\t(.*)$", FoundReplace,     pos + 1) ; Gets all replacements from the tab delimited file
Clipboard := RegExReplace(Clipboard, "i)\b" FoundReplace1 "\b",     FoundReplace2) ; Replaces all occurrences in the clipboard

; add exceptions
Clipboard := StrReplace(Clipboard, "Vice President,", "")
Clipboard := StrReplace(Clipboard, "Director,", "")
Clipboard := StrReplace(Clipboard, "Senior Vice President,", "")

; = End of exceptions

Clipboard := RegExReplace(Clipboard, "^\s+|\s+(?=([\s,;:.]))|\s$") ; Removes     extra spaces
Send, ^v ; Pastes the clipboard
Return

SelectToClip:
Clipboard := ""
Send, ^c
ClipWait, 0
If ErrorLevel
Exit
Sleep, 50
Return

这是我的替换文件的一部分:

Chief Operating, Financial Officer  CFO & COO
Head,
President,

我的问题是如何在制表符分隔文件中添加后跟逗号 (,) 或点 (.) 的文本,而不是在 AHK 文件中添加更多行?因为如您所知,它无法将逗号和点理解为文本。

非常感谢您的宝贵时间和帮助!!

  1. 请缩进,否则您的代码将更难阅读

  2. 在正则表达式中,\b assertion requires a sequence of a word character and a non-word character 使您的代码无法处理以逗号或点、非单词字符开头的字符串。

    ...\b, and \B because they are defined in terms of \w and \W.
    ...
    A word boundary is a position in the subject string where the current character and the previous character do not both match \w or \W (i.e. one matches \w and the other matches \W), or the start or end of the string if the first or last character matches \w, respectively.

以下测试有效:

#NoEnv
#SingleInstance force
SetWorkingDir %A_ScriptDir%
SendMode Input
; -- Ctrl + SPACE -> Select all text + replace whole words only + title case
^SPACE::
FunctionNameOfYourChoice() {
    ; Using static vars allows you to avoid reading the file over and over on each key press.
    Static NonCapitalized   := "a|an|in|is|of|the|this|with" ; List of words that shouldn't be capitalized, separated by pipes
         , ReplacementsFile := "replacements.txt" ; Path to replacements file (tab delimited file with 2 columns, UTF-8-BOM, CR+LF)
         , Replacements     := ReadReplacements(ReplacementsFile)

    Send ^a ; Selects all text
    SelectToClip() ; Copies the selected text to the clipboard
    If ErrorLevel { ; Error message if file is not found
        MsgBox % "File not found: " ReplacementsFile
        Return
    }

    ; 3. StringUpper is deprecated in v2.
    ; 4. Better to work on a plain variable than on the clipboard in terms of performance and reliability.
    cbCnt := Format("{:T}", Clipboard)   ; Whole clipboard to title case
    ; Changes to lowercase all words from the list "NonCapitalized", except those preceded by new line/period/exclamation mark/question mark
    cbCnt := RegExReplace(cbCnt, "i)(?<![!?.]) \b(" NonCapitalized ")\b", " $L1")
    ; Goes through each pair of search and replacement strings
    Loop Parse, Replacements, `n, `r
        FoundReplace := StrSplit(A_LoopField, "`t")
        ; Replaces all occurrences in the clipboard
        , cbCnt := RegExReplace(cbCnt, "i)(?<!\w)\Q" FoundReplace.1 "\E(?!\w)", FoundReplace.2)   ; 5.
    cbCnt := RegExReplace(cbCnt, "(?<=\w-)([a-z])", "$U1")   ; 6.
/*
    ; Now the following can be included in the replacements.txt file.
    cbCnt := StrReplace(cbCnt, "Vice President,")
    cbCnt := StrReplace(cbCnt, "Director,")
    cbCnt := StrReplace(cbCnt, "Senior Vice President,")
*/
    ; Removes extra spaces
    ; This also removes all newlines. Are you sure you want to do this?
    Clipboard := RegExReplace(cbCnt, "^\s+|\s+(?=([\s,;:.]))|\s$")
    Send ^v ; Pastes the clipboard
}

SelectToClip() {
    Clipboard := ""
    Send ^c
    ClipWait 0.5   ; Specifying 0 wouldn't be a very good idea.
    If ErrorLevel
        Exit
    Sleep 50
}

ReadReplacements(path) {
    FileRead, Replacements, % path
    Return Replacements
}


编辑

  1. 是的,第二个正则表达式(其中的第一个断言)中有错字,已更正。 "and" 的问题不再重复。

  2. 我添加了另一个 RegExReplace 作为 不够优雅的临时措施 来解决你描述的带连字符的问题,但请注意它本质上是一个不平凡的问题,因为这些问题的大写取决于语义。