Ruby:将字符串按大写字母和首字母缩写词拆分成单词
Ruby: break string into words by capital letters and acronyms
我需要用大写字母和首字母缩写词将一个字符串分成几个字符串,我可以这样做:
myString.scan(/[A-Z][a-z]+/)
但它只适用于大写字母,例如:
QuickFoxReadingPDF
或
LazyDogASAPSleep
结果中缺少全部大写的首字母缩略词。
我应该将 RegEx 更改为什么,或者有其他替代方法吗?
谢谢!
更新:
后来我发现我的一些数据有数字,比如“RabbitHole3”,如果解决方案可以考虑数字,那就太好了,即。 ["Rabbit", "Hole3"]
.
模式 [A-Z][a-z]+
匹配单个大写字符 A-Z 和一个或多个小写字符 a-z,不考虑多个大写字符。
在这种情况下,当大写字符后面没有直接跟小写字符时,您还想匹配它 a-z
。
不确定首字母缩略词是否可以由单个大写字符组成,但是否应该至少有 2 个大写字符
[A-Z][a-z]+|[A-Z]{2,}(?![a-z])
使用
s.split(/(?<=\p{Ll})(?=\p{Lu})|(?<=\p{Lu})(?=\p{Lu}\p{Ll})/)
见proof。
说明
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\p{Ll} any lowercase letter
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\p{Lu} any uppercase letter
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\p{Lu} any uppercase letter
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\p{Lu}\p{Ll} any uppercase letter, any lowercase letter
--------------------------------------------------------------------------------
) end of look-ahead
str = 'QuickFoxReadingPDF'
p str.split(/(?<=\p{Ll})(?=\p{Lu})|(?<=\p{Lu})(?=\p{Lu}\p{Ll})/)
结果:["Quick", "Fox", "Reading", "PDF"]
我需要用大写字母和首字母缩写词将一个字符串分成几个字符串,我可以这样做:
myString.scan(/[A-Z][a-z]+/)
但它只适用于大写字母,例如:
QuickFoxReadingPDF
或
LazyDogASAPSleep
结果中缺少全部大写的首字母缩略词。
我应该将 RegEx 更改为什么,或者有其他替代方法吗?
谢谢!
更新:
后来我发现我的一些数据有数字,比如“RabbitHole3”,如果解决方案可以考虑数字,那就太好了,即。 ["Rabbit", "Hole3"]
.
模式 [A-Z][a-z]+
匹配单个大写字符 A-Z 和一个或多个小写字符 a-z,不考虑多个大写字符。
在这种情况下,当大写字符后面没有直接跟小写字符时,您还想匹配它 a-z
。
不确定首字母缩略词是否可以由单个大写字符组成,但是否应该至少有 2 个大写字符
[A-Z][a-z]+|[A-Z]{2,}(?![a-z])
使用
s.split(/(?<=\p{Ll})(?=\p{Lu})|(?<=\p{Lu})(?=\p{Lu}\p{Ll})/)
见proof。
说明
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\p{Ll} any lowercase letter
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\p{Lu} any uppercase letter
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\p{Lu} any uppercase letter
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\p{Lu}\p{Ll} any uppercase letter, any lowercase letter
--------------------------------------------------------------------------------
) end of look-ahead
str = 'QuickFoxReadingPDF'
p str.split(/(?<=\p{Ll})(?=\p{Lu})|(?<=\p{Lu})(?=\p{Lu}\p{Ll})/)
结果:["Quick", "Fox", "Reading", "PDF"]