正则表达式将超过 3 个字符的单词的首字母大写,并在连字符和撇号之后

Regex capitalise first letter of words more than 3 chars, and after hyphens and apostrophes

基本上...

我正在尝试对字符串执行自定义大写;我花了几个小时与 Regex 战斗,但无济于事...

要求:

I need to capitalise:

  1. If first word >3 chars: First letter of the first word.
  2. If last word >3 chars: First letter of the last word.
  3. Always: First letter following a hyphen or apostrophe.

(The final regex needs to be implementable into VB6)

Examples:
anne-marie          >  Anne-Marie          // 1st letter of first word + after hyphen
vom schattenreich   >  vom Schattenreich   // 1st letter of last word
will it work-or-not >  Will it Work-Or-Not // 1st letter of outer words + after hyphens
seth o'callaghan    >  Seth O'Callaghan    // 1st letter of outer words + after apostrophe
first and last only >  First and last Only // 1st letter of outer words (excl. middle)
sarah jane o'brien  >  Sarah jane O'Brien  // 1st letter of outer words (excl. middle)

到目前为止我得到了什么:

我拼凑了两个正则表达式,它们之间几乎可以完成我需要的。然而,我试图将它们合并成一个正则表达式或将其写成一个正则表达式,但都失败了。

我的主要困难是我的部分大写仅适用于第一个和最后一个单词,而标点符号特定的大写需要适用于整个字符串。但我对正则表达式的了解还不够,无法确定是否可以使用一个表达式。

我的正则表达式:

First letter of First and Last words 但不限制超过 3 个字符的单词,并且不处理完整的字符串标点符号大写

^([a-zA-Z]).*\s([a-zA-Z])[a-zA-Z-]+$

First letter of all words, and after punctuation, where more than 3 chars 但不排除中间词,也不处理末尾的标点符号

(\b[a-zA-Z](?=[a-zA-Z-']{3}))

问题

How I can combine these two regex's to meet my requirements, or correct them enough that they can be used separately? Alternatively provide a different regex that meets the requirements.

参考/相关来源material:

Regex capitalize first letter every word, also after a special character like a dash

First word and first letter of last word of string with Regex

这是我的一种正则表达式方法:

Sub ReplaceAndTurnUppercase()

Dim reg As RegExp
Dim res As String

Set reg = New RegExp
With reg
    .Pattern = "^[a-z](?=[a-zA-Z'-]{3})|\b[a-zA-Z](?=[a-zA-Z'-]{3,}$)|['-][a-z]"
    .Global = True
    .MultiLine = True
End With
s = "anne-marie" & vbCrLf & "vom schattenreich" & vbCrLf & "will it work-or-not" & vbCrLf & "seth o'callaghan" & vbCrLf & "first and last only" & vbCrLf & "sarah jane o'brien"
res = s
For Each Match In reg.Execute(s)
    If Len(Match.Value) > 0 Then
        res = Left(res, Match.FirstIndex) & UCase(Match.Value) & Mid(res, Match.FirstIndex + Len(Match.Value) + 1)
    End If
Next Match
Debug.Print res ' Demo part

End Sub

我使用的正则表达式是 ^[a-z](?=[a-zA-Z'-]{3})|\b[a-z](?=[a-zA-Z'-]{3,}$)|['-][a-z]。由于所有消耗的字符只是我们想要转为大写或 hyphen/apostrophe 的字母,我们可以将它们全部转为大写而无需关心捕获其中任何一个。

正则表达式匹配 3 个选项:

  • ^[a-z](?=[a-zA-Z'-]{3}) - 字符串的开头(在我的例子中,是我使用 Multiline=True 后的行)后跟一个小写 ASCII 字母(已使用,稍后将大写),后面有 3 个字符, 字母或 '- (未消耗,在前瞻中)
  • \b[a-z](?=[a-zA-Z'-]{3,}$) - 单词边界 \b 后跟小写 ASCII 字母(已消耗)后跟 3 个或更多字母或 '- 直到结尾字符串(在我的例子中是行)
  • ['-][a-z] - 匹配 '- 然后是小写字母(字符串中的任何位置)。

res = Left(res, match.FirstIndex) & UCase(match.Value) & Mid(res, match.FirstIndex + Len(match.Value) + 1) 行完成了这项工作:它只是获取字符串的一部分直到找到的索引,然后添加修改后的文本,并附加其余部分。