使用正则表达式验证标题大小写全名
Validate Title Case Full Name with Regex
为了学习正则表达式,我正在解决一些问题来训练和学习。这就是问题所在,我知道这可能不是处理正则表达式的最佳方式,而且我的正则表达式一团糟,但我喜欢挑战。
问题:
- 名称需要 Title Case;
- 里面有部分小写单词有例外;
- 还有一些名字,例如:McDonald、MacDuff、D'Estoile
- 接受带有
'
和 -
的名称,有时它们是 o'Brien、O'brien、O'Brien、O' Brien 或 'Ehu Kali。
- 名称的开头和结尾没有白色space;
- 全名的每个名字之间不超过一个space;
- A
.
如果不是单独的,则被接受,例如:丹。 Ferdnand(未被接受)和 Dan G. Ferdnand(被接受)
- 不接受数字和符号
- 但是,罗马数字被接受并且不是首字母大写,例如:伊丽莎白二世
- 有些名字可以单独使用,例如:明仁(日本亲王)
- 接受一些在某些国家/地区常见的特殊字符,例如:Valeh ßlÿsgÿroğlu、Lażżru Role、Alaksiej Taraškievič
正则表达式
密码是
^(?![ ])(?!.*(?:\d|[ ]{2}|[!$%^&*()_+|~=`\{\}\[\]:";<>?,\/]))(?:(?:e|da|do|das|dos|de|d'|la|las|el|los|l'|al|of|the|el-|al-|di|van|der|op|den|ter|te|ten|ben|ibn)\s*?|(?:[A-ZàáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšžÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ∂ð'][^\s]*\s*?)(?!.*[ ]$))+$
并且 Regex101 带有验证列表
参考资料
到目前为止我尝试的是基于这些:
- regular expression for first and last name
- Regular Expression to disallow two consecutive white spaces in the middle of a string
- A regex to test if all words are title-case
- How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
不工作
我做了这个正则表达式,但不知道如何让它不识别下面匹配的情况:
- 大写字母[=79=]
- 替代字母[=79=]
那些不是而且应该:
- UrxanƏbűlhəsənzadə
- İsmət Jafarov
- Şükür Hagverdiyev
- 米德·阿卜杜拉希莫夫
- 埃拉尔多·塞拉尔塔
- 伊库巴黎
问题
有没有办法优化这个正则表达式(怪物)?
以及如何解决之前在 不工作 上提到的问题?
p.s.: 可以在 link 到 Regex101.
上找到带有验证示例的名称列表
简介
鉴于您正在学习 Regex 并且没有指定要使用的 Regex 风格,我选择了 PCRE,因为它在 Regex 世界中有广泛的支持。
代码
(?(DEFINE)
(?# Definitions )
(?<valid_nameChars>[\p{L}\p{Nl}])
(?<valid_nonNameChars>[^\p{L}\p{Nl}\p{Zs}])
(?<valid_startFirstName>(?![a-z])[\p{L}'])
(?<valid_upperChar>(?![a-z])\p{L})
(?<valid_nameSeparatorsSoft>[\p{Pd}'])
(?<valid_nameSeparatorsHard>\p{Zs})
(?<valid_nameSeparators>(?&valid_nameSeparatorsSoft)|(?&valid_nameSeparatorsHard))
(?# Invalid combinations )
(?<invalid_startChar>^[\p{Zs}a-z])
(?<invalid_endChar>.*[^\p{L}\p{Nl}.\p{C}]$)
(?<invalid_unaccompaniedSymbol>.*(?&valid_nameSeparatorsHard)(?&valid_nonNameChars)(?&valid_nameSeparatorsHard))
(?<invalid_overTwoUpper>(?:(?&valid_nameChars)*\p{Lu}){3})
(?<invalid>(?&invalid_startChar)|(?&invalid_endChar)|(?&invalid_unaccompaniedSymbol)|(?&invalid_overTwoUpper))
(?# Valid combinations )
(?<valid_name>(?:(?:(?&valid_nameChars)|(?&valid_nameSeparatorsSoft))*(?&valid_nameChars)+(?:(?&valid_nameChars)|(?&valid_nameSeparatorsSoft))*)+\.?)
(?<valid_firstName>(?&valid_startFirstName)(?:\.|(?&valid_name)*))
(?<valid_multipleName>(?&valid_firstName)(?=.*(?&valid_nameSeparators)(?&valid_upperChar))(?:(?&valid_nameSeparatorsHard)(?&valid_name))+)
(?<valid>(?&valid_multipleName)|(?&valid_firstName))
)
^(?!(?&invalid))(?&valid)$
结果
输入
== 1NcOrrect N4M3S ==
CAPITAL LETTER
AlTeRnAtE LeTtEr
Natalia maria
Natalia aria
Natalia orea
Maria dornelas
Samuel eto'
Miguel lasagna
Antony1 de Home Ap*ril
Ap*ril Willians
Antony_ de Home Apr+il
Ant_ony de Home Apr#il
Antony@ de Ho@me Apr^il
Maria Silva
Maria silva
maria Silva
Maria Silva
Maria Silva
Maria / Silva
Maria . Silva
John W8
==Correct Names==
Urxan Əbűlhəsənzadə
İsmət Jafarov
Şükür Hagverdiyev
Űmid Abdurrahimov
Ġerardo Seralta
Ċikku Paris
Hind ibn Sheik
Colop-U-Uichikin
Lażżru Role
Alaksiej Taraškievič
Petruso Husoǔski
Sumu-la-El
Valeh ßlÿsgÿroğlu
'Arab al-Rashayida
Tariq al-Hashimi
Nabeeh el-Mady
Tariq Al-Hashimi
Brian O'Conner
Maria da Silva
Maria Silva
Maria G. Silva
Maria McDuffy
Getúlio Dornelles Vargas
Maria das Flores
John Smith
John D'Largy
John Doe-Smith
John Doe Smith
Hector Sausage-Hausen
Mathias d'Arras
Martin Luther King Jr.
Ai Wong
Chao Chang
Alzbeta Bara
Marcos Assunção
Maria da Silva e Silva
Juscelino Kubitschek de Oliveira
Maria da Costa e Silva
Samuel Eto'o
María Antonieta de las Nieves
Eugène
Antòny de Homé April
àntony de Home ùpril
Antony de Home Aprìl
Pierre de l'Estache
Pierre de L'Estoile
Akihito
Nadine Schröder
Anna A. Møller
D. Pedro I
Pope Benedict XVI
Marsibil Ragnarsdóttir
Natanaël Morel
Isaac De la Croix
Jean-Michel Bozonnet
Qutaibah Mu'tazz Abadi
Rushd Jawna' Kassab
Khaldun Abdul-Qahhar Sabbag
'Awad Bashshar Asker
Al B. Zellweger
Gunnleif Snæ-Ulfsson
Käre Toresson
Sorli Ærnmundsson
Arnkel Øystæinsson
Ástríður Dórey
Åsmund Kåresson
Yahatti-Il
Ipqu-Annunitum
Nabu-zar-adan
Eskopas Cañaverri
Botolph of Langchester
Aelfhun the Cantrell
Fraco di Natale
Fraco Di Natale
Iván de Luca
Iván De Luca
Man'nah
Atabala Aüamusalü
Ramiz Ağasəfalu
Dadaş Aghakhanov
Fÿrxad Mübarizlı
Vaclaǔ Šupa
Yakiv Volacič
Flor Van Vaerenbergh
Flor van Vaerenbergh
Edwin van der Sar
Husein Ekmečić
Álvaro Guimarães Alencar
Phone U Yaza Arkar
Seocan MacGhille
X'wat'e Tlekadugovy
Albert-Jan Bootsveld
Maurits-jan Kuipers op den Kollenstaart
Elco ter Hoek
Robbert te Poele
Aad ten Have
'Ehu Kali
Ho'opa'a Loni
Aukanai'i Mahi'ai
Kalman ben Tal El
Żytomir Roszkowski
K'awai
==EXTRA== only if possible, strange ones
Maol-Moire Mac'IlleBhuidh
Tòmas MacIlleChruim
Aindreas MacIllEathain
Eanruig MacGilleBhreac
Peadar MacGilleDhonaghart
Maolmhuire MacGill-Eain
Eanruig MacGilleBhreac
Wim van 't Plasman
输出
注意:下面显示的只是上面Input
匹配的字符串
Urxan Əbűlhəsənzadə
İsmət Jafarov
Şükür Hagverdiyev
Űmid Abdurrahimov
Ġerardo Seralta
Ċikku Paris
Hind ibn Sheik
Colop-U-Uichikin
Lażżru Role
Alaksiej Taraškievič
Petruso Husoǔski
Sumu-la-El
Valeh ßlÿsgÿroğlu
'Arab al-Rashayida
Tariq al-Hashimi
Nabeeh el-Mady
Tariq Al-Hashimi
Brian O'Conner
Maria da Silva
Maria Silva
Maria G. Silva
Maria McDuffy
Getúlio Dornelles Vargas
Maria das Flores
John Smith
John D'Largy
John Doe-Smith
John Doe Smith
Hector Sausage-Hausen
Mathias d'Arras
Martin Luther King Jr.
Ai Wong
Chao Chang
Alzbeta Bara
Marcos Assunção
Maria da Silva e Silva
Juscelino Kubitschek de Oliveira
Maria da Costa e Silva
Samuel Eto'o
María Antonieta de las Nieves
Eugène
Antòny de Homé April
àntony de Home ùpril
Antony de Home Aprìl
Pierre de l'Estache
Pierre de L'Estoile
Akihito
Nadine Schröder
Anna A. Møller
D. Pedro I
Pope Benedict XVI
Marsibil Ragnarsdóttir
Natanaël Morel
Isaac De la Croix
Jean-Michel Bozonnet
Qutaibah Mu'tazz Abadi
Rushd Jawna' Kassab
Khaldun Abdul-Qahhar Sabbag
'Awad Bashshar Asker
Al B. Zellweger
Gunnleif Snæ-Ulfsson
Käre Toresson
Sorli Ærnmundsson
Arnkel Øystæinsson
Ástríður Dórey
Åsmund Kåresson
Yahatti-Il
Ipqu-Annunitum
Nabu-zar-adan
Eskopas Cañaverri
Botolph of Langchester
Aelfhun the Cantrell
Fraco di Natale
Fraco Di Natale
Iván de Luca
Iván De Luca
Man'nah
Atabala Aüamusalü
Ramiz Ağasəfalu
Dadaş Aghakhanov
Fÿrxad Mübarizlı
Vaclaǔ Šupa
Yakiv Volacič
Flor Van Vaerenbergh
Flor van Vaerenbergh
Edwin van der Sar
Husein Ekmečić
Álvaro Guimarães Alencar
Phone U Yaza Arkar
Seocan MacGhille
X'wat'e Tlekadugovy
Albert-Jan Bootsveld
Maurits-jan Kuipers op den Kollenstaart
Elco ter Hoek
Robbert te Poele
Aad ten Have
'Ehu Kali
Ho'opa'a Loni
Aukanai'i Mahi'ai
Kalman ben Tal El
Żytomir Roszkowski
K'awai
Maol-Moire Mac'IlleBhuidh
Tòmas MacIlleChruim
Aindreas MacIllEathain
Eanruig MacGilleBhreac
Peadar MacGilleDhonaghart
Maolmhuire MacGill-Eain
Eanruig MacGilleBhreac
Wim van 't Plasman
说明
我使用定义块来创建定义。您可以查看每个定义以了解其工作原理。通常,我使用 \p{.}
,其中 .
被一些指向 Unicode 字符组的指针替换(即 \p{L}
是来自任何语言的任何字母——这在大多数正则表达式中不起作用,但如果可用的话,它确实允许正则表达式更加简化,这就是我使用它的原因)。
如果您需要任何其他解释,请随时问我,我会尽力而为,但 regex101 应该能够解释您对 regex 的任何疑问。
为了学习正则表达式,我正在解决一些问题来训练和学习。这就是问题所在,我知道这可能不是处理正则表达式的最佳方式,而且我的正则表达式一团糟,但我喜欢挑战。
问题:
- 名称需要 Title Case;
- 里面有部分小写单词有例外;
- 还有一些名字,例如:McDonald、MacDuff、D'Estoile
- 接受带有
'
和-
的名称,有时它们是 o'Brien、O'brien、O'Brien、O' Brien 或 'Ehu Kali。 - 名称的开头和结尾没有白色space;
- 全名的每个名字之间不超过一个space;
- A
.
如果不是单独的,则被接受,例如:丹。 Ferdnand(未被接受)和 Dan G. Ferdnand(被接受) - 不接受数字和符号
- 但是,罗马数字被接受并且不是首字母大写,例如:伊丽莎白二世
- 有些名字可以单独使用,例如:明仁(日本亲王)
- 接受一些在某些国家/地区常见的特殊字符,例如:Valeh ßlÿsgÿroğlu、Lażżru Role、Alaksiej Taraškievič
正则表达式
密码是
^(?![ ])(?!.*(?:\d|[ ]{2}|[!$%^&*()_+|~=`\{\}\[\]:";<>?,\/]))(?:(?:e|da|do|das|dos|de|d'|la|las|el|los|l'|al|of|the|el-|al-|di|van|der|op|den|ter|te|ten|ben|ibn)\s*?|(?:[A-ZàáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšžÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ∂ð'][^\s]*\s*?)(?!.*[ ]$))+$
并且 Regex101 带有验证列表
参考资料
到目前为止我尝试的是基于这些:
- regular expression for first and last name
- Regular Expression to disallow two consecutive white spaces in the middle of a string
- A regex to test if all words are title-case
- How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
不工作
我做了这个正则表达式,但不知道如何让它不识别下面匹配的情况:
- 大写字母[=79=]
- 替代字母[=79=]
那些不是而且应该:
- UrxanƏbűlhəsənzadə
- İsmət Jafarov
- Şükür Hagverdiyev
- 米德·阿卜杜拉希莫夫
- 埃拉尔多·塞拉尔塔
- 伊库巴黎
问题
有没有办法优化这个正则表达式(怪物)?
以及如何解决之前在 不工作 上提到的问题?
p.s.: 可以在 link 到 Regex101.
上找到带有验证示例的名称列表简介
鉴于您正在学习 Regex 并且没有指定要使用的 Regex 风格,我选择了 PCRE,因为它在 Regex 世界中有广泛的支持。
代码
(?(DEFINE)
(?# Definitions )
(?<valid_nameChars>[\p{L}\p{Nl}])
(?<valid_nonNameChars>[^\p{L}\p{Nl}\p{Zs}])
(?<valid_startFirstName>(?![a-z])[\p{L}'])
(?<valid_upperChar>(?![a-z])\p{L})
(?<valid_nameSeparatorsSoft>[\p{Pd}'])
(?<valid_nameSeparatorsHard>\p{Zs})
(?<valid_nameSeparators>(?&valid_nameSeparatorsSoft)|(?&valid_nameSeparatorsHard))
(?# Invalid combinations )
(?<invalid_startChar>^[\p{Zs}a-z])
(?<invalid_endChar>.*[^\p{L}\p{Nl}.\p{C}]$)
(?<invalid_unaccompaniedSymbol>.*(?&valid_nameSeparatorsHard)(?&valid_nonNameChars)(?&valid_nameSeparatorsHard))
(?<invalid_overTwoUpper>(?:(?&valid_nameChars)*\p{Lu}){3})
(?<invalid>(?&invalid_startChar)|(?&invalid_endChar)|(?&invalid_unaccompaniedSymbol)|(?&invalid_overTwoUpper))
(?# Valid combinations )
(?<valid_name>(?:(?:(?&valid_nameChars)|(?&valid_nameSeparatorsSoft))*(?&valid_nameChars)+(?:(?&valid_nameChars)|(?&valid_nameSeparatorsSoft))*)+\.?)
(?<valid_firstName>(?&valid_startFirstName)(?:\.|(?&valid_name)*))
(?<valid_multipleName>(?&valid_firstName)(?=.*(?&valid_nameSeparators)(?&valid_upperChar))(?:(?&valid_nameSeparatorsHard)(?&valid_name))+)
(?<valid>(?&valid_multipleName)|(?&valid_firstName))
)
^(?!(?&invalid))(?&valid)$
结果
输入
== 1NcOrrect N4M3S ==
CAPITAL LETTER
AlTeRnAtE LeTtEr
Natalia maria
Natalia aria
Natalia orea
Maria dornelas
Samuel eto'
Miguel lasagna
Antony1 de Home Ap*ril
Ap*ril Willians
Antony_ de Home Apr+il
Ant_ony de Home Apr#il
Antony@ de Ho@me Apr^il
Maria Silva
Maria silva
maria Silva
Maria Silva
Maria Silva
Maria / Silva
Maria . Silva
John W8
==Correct Names==
Urxan Əbűlhəsənzadə
İsmət Jafarov
Şükür Hagverdiyev
Űmid Abdurrahimov
Ġerardo Seralta
Ċikku Paris
Hind ibn Sheik
Colop-U-Uichikin
Lażżru Role
Alaksiej Taraškievič
Petruso Husoǔski
Sumu-la-El
Valeh ßlÿsgÿroğlu
'Arab al-Rashayida
Tariq al-Hashimi
Nabeeh el-Mady
Tariq Al-Hashimi
Brian O'Conner
Maria da Silva
Maria Silva
Maria G. Silva
Maria McDuffy
Getúlio Dornelles Vargas
Maria das Flores
John Smith
John D'Largy
John Doe-Smith
John Doe Smith
Hector Sausage-Hausen
Mathias d'Arras
Martin Luther King Jr.
Ai Wong
Chao Chang
Alzbeta Bara
Marcos Assunção
Maria da Silva e Silva
Juscelino Kubitschek de Oliveira
Maria da Costa e Silva
Samuel Eto'o
María Antonieta de las Nieves
Eugène
Antòny de Homé April
àntony de Home ùpril
Antony de Home Aprìl
Pierre de l'Estache
Pierre de L'Estoile
Akihito
Nadine Schröder
Anna A. Møller
D. Pedro I
Pope Benedict XVI
Marsibil Ragnarsdóttir
Natanaël Morel
Isaac De la Croix
Jean-Michel Bozonnet
Qutaibah Mu'tazz Abadi
Rushd Jawna' Kassab
Khaldun Abdul-Qahhar Sabbag
'Awad Bashshar Asker
Al B. Zellweger
Gunnleif Snæ-Ulfsson
Käre Toresson
Sorli Ærnmundsson
Arnkel Øystæinsson
Ástríður Dórey
Åsmund Kåresson
Yahatti-Il
Ipqu-Annunitum
Nabu-zar-adan
Eskopas Cañaverri
Botolph of Langchester
Aelfhun the Cantrell
Fraco di Natale
Fraco Di Natale
Iván de Luca
Iván De Luca
Man'nah
Atabala Aüamusalü
Ramiz Ağasəfalu
Dadaş Aghakhanov
Fÿrxad Mübarizlı
Vaclaǔ Šupa
Yakiv Volacič
Flor Van Vaerenbergh
Flor van Vaerenbergh
Edwin van der Sar
Husein Ekmečić
Álvaro Guimarães Alencar
Phone U Yaza Arkar
Seocan MacGhille
X'wat'e Tlekadugovy
Albert-Jan Bootsveld
Maurits-jan Kuipers op den Kollenstaart
Elco ter Hoek
Robbert te Poele
Aad ten Have
'Ehu Kali
Ho'opa'a Loni
Aukanai'i Mahi'ai
Kalman ben Tal El
Żytomir Roszkowski
K'awai
==EXTRA== only if possible, strange ones
Maol-Moire Mac'IlleBhuidh
Tòmas MacIlleChruim
Aindreas MacIllEathain
Eanruig MacGilleBhreac
Peadar MacGilleDhonaghart
Maolmhuire MacGill-Eain
Eanruig MacGilleBhreac
Wim van 't Plasman
输出
注意:下面显示的只是上面Input
匹配的字符串Urxan Əbűlhəsənzadə
İsmət Jafarov
Şükür Hagverdiyev
Űmid Abdurrahimov
Ġerardo Seralta
Ċikku Paris
Hind ibn Sheik
Colop-U-Uichikin
Lażżru Role
Alaksiej Taraškievič
Petruso Husoǔski
Sumu-la-El
Valeh ßlÿsgÿroğlu
'Arab al-Rashayida
Tariq al-Hashimi
Nabeeh el-Mady
Tariq Al-Hashimi
Brian O'Conner
Maria da Silva
Maria Silva
Maria G. Silva
Maria McDuffy
Getúlio Dornelles Vargas
Maria das Flores
John Smith
John D'Largy
John Doe-Smith
John Doe Smith
Hector Sausage-Hausen
Mathias d'Arras
Martin Luther King Jr.
Ai Wong
Chao Chang
Alzbeta Bara
Marcos Assunção
Maria da Silva e Silva
Juscelino Kubitschek de Oliveira
Maria da Costa e Silva
Samuel Eto'o
María Antonieta de las Nieves
Eugène
Antòny de Homé April
àntony de Home ùpril
Antony de Home Aprìl
Pierre de l'Estache
Pierre de L'Estoile
Akihito
Nadine Schröder
Anna A. Møller
D. Pedro I
Pope Benedict XVI
Marsibil Ragnarsdóttir
Natanaël Morel
Isaac De la Croix
Jean-Michel Bozonnet
Qutaibah Mu'tazz Abadi
Rushd Jawna' Kassab
Khaldun Abdul-Qahhar Sabbag
'Awad Bashshar Asker
Al B. Zellweger
Gunnleif Snæ-Ulfsson
Käre Toresson
Sorli Ærnmundsson
Arnkel Øystæinsson
Ástríður Dórey
Åsmund Kåresson
Yahatti-Il
Ipqu-Annunitum
Nabu-zar-adan
Eskopas Cañaverri
Botolph of Langchester
Aelfhun the Cantrell
Fraco di Natale
Fraco Di Natale
Iván de Luca
Iván De Luca
Man'nah
Atabala Aüamusalü
Ramiz Ağasəfalu
Dadaş Aghakhanov
Fÿrxad Mübarizlı
Vaclaǔ Šupa
Yakiv Volacič
Flor Van Vaerenbergh
Flor van Vaerenbergh
Edwin van der Sar
Husein Ekmečić
Álvaro Guimarães Alencar
Phone U Yaza Arkar
Seocan MacGhille
X'wat'e Tlekadugovy
Albert-Jan Bootsveld
Maurits-jan Kuipers op den Kollenstaart
Elco ter Hoek
Robbert te Poele
Aad ten Have
'Ehu Kali
Ho'opa'a Loni
Aukanai'i Mahi'ai
Kalman ben Tal El
Żytomir Roszkowski
K'awai
Maol-Moire Mac'IlleBhuidh
Tòmas MacIlleChruim
Aindreas MacIllEathain
Eanruig MacGilleBhreac
Peadar MacGilleDhonaghart
Maolmhuire MacGill-Eain
Eanruig MacGilleBhreac
Wim van 't Plasman
说明
我使用定义块来创建定义。您可以查看每个定义以了解其工作原理。通常,我使用 \p{.}
,其中 .
被一些指向 Unicode 字符组的指针替换(即 \p{L}
是来自任何语言的任何字母——这在大多数正则表达式中不起作用,但如果可用的话,它确实允许正则表达式更加简化,这就是我使用它的原因)。
如果您需要任何其他解释,请随时问我,我会尽力而为,但 regex101 应该能够解释您对 regex 的任何疑问。