在非连续大写字母上拆分字符串
Split string on non consecutive capital letters
我正在尝试按大写字母拆分字符串,但我不想拆分两个连续的大写字母。
所以现在我正在这样做:
my_string == "TTestStringAA"
re.findall('[a-zA-Z][^A-Z]*', my_string)
>>> ['T', 'Test', 'String', 'A', 'A']
但我正在寻找的输出是:
>>> ['TTest', 'String', 'AA']
这个问题有干净简单的解决方案吗?
谢谢!
我认为[A-Z]+[a-z]*
符合您的要求:
>>> re.findall(r'[A-Z]+[a-z]*', my_string)
['TTest', 'String', 'AA']
以下正则表达式将 return 正确的结果。
[a-z]*[A-Z]+[a-z]*|[a-z]+$
测试用例:
tests = ['a', 'A', 'aa', 'Aa' 'AaAaAAAaAa', 'aTTestStringAA']
regex = re.compile(r'[a-z]*[A-Z]+[a-z]*|[a-z]+$')
for test in tests:
print('{} => {}'.format(test, re.findall(regex, test)))
将re.split
与
结合使用
(?<=[a-z])(?=[A-Z])
参见 proof。
说明
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
) end of look-ahead
import re
pattern = r"(?<=[a-z])(?=[A-Z])"
test = "TTestStringAA"
print(re.split(pattern, test))
结果:
['TTest', 'String', 'AA']
我正在尝试按大写字母拆分字符串,但我不想拆分两个连续的大写字母。
所以现在我正在这样做:
my_string == "TTestStringAA"
re.findall('[a-zA-Z][^A-Z]*', my_string)
>>> ['T', 'Test', 'String', 'A', 'A']
但我正在寻找的输出是:
>>> ['TTest', 'String', 'AA']
这个问题有干净简单的解决方案吗?
谢谢!
我认为[A-Z]+[a-z]*
符合您的要求:
>>> re.findall(r'[A-Z]+[a-z]*', my_string)
['TTest', 'String', 'AA']
以下正则表达式将 return 正确的结果。
[a-z]*[A-Z]+[a-z]*|[a-z]+$
测试用例:
tests = ['a', 'A', 'aa', 'Aa' 'AaAaAAAaAa', 'aTTestStringAA']
regex = re.compile(r'[a-z]*[A-Z]+[a-z]*|[a-z]+$')
for test in tests:
print('{} => {}'.format(test, re.findall(regex, test)))
将re.split
与
(?<=[a-z])(?=[A-Z])
参见 proof。
说明
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
) end of look-ahead
import re
pattern = r"(?<=[a-z])(?=[A-Z])"
test = "TTestStringAA"
print(re.split(pattern, test))
结果:
['TTest', 'String', 'AA']