在 upper/lower 个大小写边界拆分字符串

Split string at upper/lower case boundaries

我想在 upper/lower-case 边界处拆分以下字符串。我如何使用正则表达式在 Python and/or 中执行此操作?

例如,

x = 'aagaaggagatataccATGAATTTGTCGGTTTACCCCAATTTAACCAAAgaaaacctgtacaa'

split_boundaries(x) = ['aagaaggagatatacc', 
                       'ATGAATTTGTCGGTTTACCCCAATTTAACCAAA',
                       'gaaaacctgtacaa']

使用re.findall:

import re
x = 'aagaaggagatataccATGAATTTGTCGGTTTACCCCAATTTAACCAAAgaaaacctgtacaa'

re.findall(r'[a-z]+|[A-Z]+', x)
# ['aagaaggagatatacc', 'ATGAATTTGTCGGTTTACCCCAATTTAACCAAA', 'gaaaacctgtacaa']

另一种根据大小写将字符串拆分为列表的方法。

x = 'ATGAaagaaggagatatacAcATGAATTTGTCGGTTTACCCCAATTTAACCAAAgaaaacctgtacaaAaa'

l=[]
lstr=''
ustr=''


def createList(s):
    l.append(s)

for i in list(x):
    if i.islower():
        lstr+=i
        if ustr != '':
            createList(ustr)
        ustr = ''
    elif i.isupper():
        if lstr != '':
            createList(lstr)
        ustr+=i
        lstr=''

if list(x)[-1].islower():
    createList(lstr)
else:
    createList(ustr)

print(l)

输出

['ATGA', 'aagaaggagatatac', 'A', 'c', 'ATGAATTTGTCGGTTTACCCCAATTTAACCAAA', 'gaaaacctgtacaa', 'A', 'aa']