如何对开始和结束由两个不同子字符串定义的字符串进行切片?
How to slice a string where start and end are defined by two different substrings?
所以我有一个这样的字符串列表:
list_strings=["YYYYATGMBMSSBAHHH","CCCCUINDAKSLLL","HHHHKJSHAKJJKKKK","ERRREEJK","XZXZOOOOYYYFFFFAKSXXX","RRRRRKJUUUUNNNNNGYRRRRRR","HHHHSDAFF"]
我有两个要在每个字符串中搜索的模式列表:
forward_patterns=['ATG', 'KJ', 'OOOO','UI']
reverse_patterns=['GY', 'AKS','BA','JK']
对于 list_strings
中的每个字符串,我希望从一个 forward_patterns
模式的位置到一个 reverse_patterns
模式的位置(包括开始和结束模式)进行切片也应删除)。对于每个模式列表,字符串应该只被切片一次,只考虑找到的第一次出现。找到任一模式列表中的哪个模式并将其用于切片是无关紧要的
我在这种情况下的输出是这样的:
list_strings=["MBMSS","ND","SHATGKJ","untrimmed","YYYFFFF","UUUUNNNNN","untrimmed"]
我已经尝试使用这些 for 循环,但不幸的是它没有修剪它们中的任何一个:
for i in range(len(list_strings)):
for pf in forward_patterns:
beg=list_strings[i].find(pf)
for pr in reverse_patterns:
end=list_strings[i].rfind(pr)
if(beg !=-1 and end !=-1):
list_strings[i]=list_strings[i][beg+len(pf):end]
else:
list_strings[i]="untrimmed"
基本上我得到了所有“未修剪”的列表,但我不知道为什么:
list_strings=["untrimmed","untrimmed","untrimmed","untrimmed","untrimmed","untrimmed","untrimmed"]
我的代码可能有什么问题?
提前感谢您的回答!
基于上次更新:
res_list = []
for s in list_strings:
upatedString = s
for f in forward_patterns:
if f in upatedString:
upatedString = upatedString[upatedString.index(f)+len(f):]
break
for r in reverse_patterns:
if r in upatedString:
upatedString = upatedString[:upatedString.index(r)]
break
if len(upatedString) == len(s):
res_list.append("Untrimmed")
else:
res_list.append(upatedString)
res_list
你的例子有点令人困惑。我不明白为什么 'ERRREEJK' 是未修剪的,即使 'JK' 处于反向模式 oO。
也许这就是您要找的东西?
list_strings=["YYYYATGMBMSSBAHHH","CCCCUINDAKSLLL","HHHHKJSHAKJJKKKK","ERRREEJK","XZXZOOOOYYYFFFFAKSXXX","RRRRRKJUUUUNNNNNGYRRRRRR","HHHHSDAFF"]
forward_patterns=['ATG', 'KJ', 'OOOO','UI']
reverse_patterns=['GY', 'AKS','BA','JK']
new_strings = []
for string in list_strings:
for pattern in forward_patterns:
_temp = string.split(pattern,1)
if len(_temp) == 2:
_temp = _temp[1]
break
else:
_temp = _temp[0]
for pattern in reverse_patterns:
_temp = _temp.rsplit(pattern,1)[0]
if len(_temp) == 2:
break
if string == _temp:
new_strings.append('untrimmed')
else:
new_strings.append(_temp)
所以我有一个这样的字符串列表:
list_strings=["YYYYATGMBMSSBAHHH","CCCCUINDAKSLLL","HHHHKJSHAKJJKKKK","ERRREEJK","XZXZOOOOYYYFFFFAKSXXX","RRRRRKJUUUUNNNNNGYRRRRRR","HHHHSDAFF"]
我有两个要在每个字符串中搜索的模式列表:
forward_patterns=['ATG', 'KJ', 'OOOO','UI']
reverse_patterns=['GY', 'AKS','BA','JK']
对于 list_strings
中的每个字符串,我希望从一个 forward_patterns
模式的位置到一个 reverse_patterns
模式的位置(包括开始和结束模式)进行切片也应删除)。对于每个模式列表,字符串应该只被切片一次,只考虑找到的第一次出现。找到任一模式列表中的哪个模式并将其用于切片是无关紧要的
我在这种情况下的输出是这样的:
list_strings=["MBMSS","ND","SHATGKJ","untrimmed","YYYFFFF","UUUUNNNNN","untrimmed"]
我已经尝试使用这些 for 循环,但不幸的是它没有修剪它们中的任何一个:
for i in range(len(list_strings)):
for pf in forward_patterns:
beg=list_strings[i].find(pf)
for pr in reverse_patterns:
end=list_strings[i].rfind(pr)
if(beg !=-1 and end !=-1):
list_strings[i]=list_strings[i][beg+len(pf):end]
else:
list_strings[i]="untrimmed"
基本上我得到了所有“未修剪”的列表,但我不知道为什么:
list_strings=["untrimmed","untrimmed","untrimmed","untrimmed","untrimmed","untrimmed","untrimmed"]
我的代码可能有什么问题? 提前感谢您的回答!
基于上次更新:
res_list = []
for s in list_strings:
upatedString = s
for f in forward_patterns:
if f in upatedString:
upatedString = upatedString[upatedString.index(f)+len(f):]
break
for r in reverse_patterns:
if r in upatedString:
upatedString = upatedString[:upatedString.index(r)]
break
if len(upatedString) == len(s):
res_list.append("Untrimmed")
else:
res_list.append(upatedString)
res_list
你的例子有点令人困惑。我不明白为什么 'ERRREEJK' 是未修剪的,即使 'JK' 处于反向模式 oO。 也许这就是您要找的东西?
list_strings=["YYYYATGMBMSSBAHHH","CCCCUINDAKSLLL","HHHHKJSHAKJJKKKK","ERRREEJK","XZXZOOOOYYYFFFFAKSXXX","RRRRRKJUUUUNNNNNGYRRRRRR","HHHHSDAFF"]
forward_patterns=['ATG', 'KJ', 'OOOO','UI']
reverse_patterns=['GY', 'AKS','BA','JK']
new_strings = []
for string in list_strings:
for pattern in forward_patterns:
_temp = string.split(pattern,1)
if len(_temp) == 2:
_temp = _temp[1]
break
else:
_temp = _temp[0]
for pattern in reverse_patterns:
_temp = _temp.rsplit(pattern,1)[0]
if len(_temp) == 2:
break
if string == _temp:
new_strings.append('untrimmed')
else:
new_strings.append(_temp)