在 upper/lower 个大小写边界拆分字符串
Split string at upper/lower case boundaries
我想在 upper/lower-case 边界处拆分以下字符串。我如何使用正则表达式在 Python and/or 中执行此操作?
例如,
x = 'aagaaggagatataccATGAATTTGTCGGTTTACCCCAATTTAACCAAAgaaaacctgtacaa'
split_boundaries(x) = ['aagaaggagatatacc',
'ATGAATTTGTCGGTTTACCCCAATTTAACCAAA',
'gaaaacctgtacaa']
使用re.findall
:
import re
x = 'aagaaggagatataccATGAATTTGTCGGTTTACCCCAATTTAACCAAAgaaaacctgtacaa'
re.findall(r'[a-z]+|[A-Z]+', x)
# ['aagaaggagatatacc', 'ATGAATTTGTCGGTTTACCCCAATTTAACCAAA', 'gaaaacctgtacaa']
另一种根据大小写将字符串拆分为列表的方法。
x = 'ATGAaagaaggagatatacAcATGAATTTGTCGGTTTACCCCAATTTAACCAAAgaaaacctgtacaaAaa'
l=[]
lstr=''
ustr=''
def createList(s):
l.append(s)
for i in list(x):
if i.islower():
lstr+=i
if ustr != '':
createList(ustr)
ustr = ''
elif i.isupper():
if lstr != '':
createList(lstr)
ustr+=i
lstr=''
if list(x)[-1].islower():
createList(lstr)
else:
createList(ustr)
print(l)
输出
['ATGA', 'aagaaggagatatac', 'A', 'c', 'ATGAATTTGTCGGTTTACCCCAATTTAACCAAA', 'gaaaacctgtacaa', 'A', 'aa']
我想在 upper/lower-case 边界处拆分以下字符串。我如何使用正则表达式在 Python and/or 中执行此操作?
例如,
x = 'aagaaggagatataccATGAATTTGTCGGTTTACCCCAATTTAACCAAAgaaaacctgtacaa'
split_boundaries(x) = ['aagaaggagatatacc',
'ATGAATTTGTCGGTTTACCCCAATTTAACCAAA',
'gaaaacctgtacaa']
使用re.findall
:
import re
x = 'aagaaggagatataccATGAATTTGTCGGTTTACCCCAATTTAACCAAAgaaaacctgtacaa'
re.findall(r'[a-z]+|[A-Z]+', x)
# ['aagaaggagatatacc', 'ATGAATTTGTCGGTTTACCCCAATTTAACCAAA', 'gaaaacctgtacaa']
另一种根据大小写将字符串拆分为列表的方法。
x = 'ATGAaagaaggagatatacAcATGAATTTGTCGGTTTACCCCAATTTAACCAAAgaaaacctgtacaaAaa'
l=[]
lstr=''
ustr=''
def createList(s):
l.append(s)
for i in list(x):
if i.islower():
lstr+=i
if ustr != '':
createList(ustr)
ustr = ''
elif i.isupper():
if lstr != '':
createList(lstr)
ustr+=i
lstr=''
if list(x)[-1].islower():
createList(lstr)
else:
createList(ustr)
print(l)
输出
['ATGA', 'aagaaggagatatac', 'A', 'c', 'ATGAATTTGTCGGTTTACCCCAATTTAACCAAA', 'gaaaacctgtacaa', 'A', 'aa']